Jim Lux's blog: Judging the International Science and Engineering Fair (ISEF)

A non food post, for a change.

I just returned from spending several days in Pittsburgh, PA (a nice place, green and hilly) judging the 2012 Intel International Science and Engineering Fair, known as ISEF. This is the big deal in the high school science fair world with about 1000 projects from all over the world. By this time, the projects have competed at a school or city fair, and then a regional or state fair, to be selected to compete at ISEF, so the standard of performance is pretty high: As the director of judging said at the judge's orientation, they are comparable to a decent journal article or master's thesis.

In this post, I'm talking about what they call the "grand awards", the first, second, etc. place ribbons (and cash), as opposed to what are called "special awards". About 25% of the projects will receive a "place" (first, second, third, fourth), with a few firsts, more seconds, and the rest thirds and fourths. There's also a "best in category": the first of first. The special awards are judged and presented by various sponsoring organizations (like the US AirForce, or IEEE, or King Faisal) and they use their own criteria and methods to pick winners.

There were some 800-900 grand award judges for the 1000 projects, which are divided up into a couple dozen categories of various sizes. My category was Engineering-Mechanical and Electrical, and was the largest, with about 120 projects. We had about 70-80 judges for the category. So how do we go about picking the winners? It's done in two steps, really. First, judges interview the finalists in front of their project display and submit scores (0-100). Then, all the scores are tabulated and all the judges meet in one room to caucus to pick the winners. What's interesting to most people is that the scores do NOT directly determine the winners, they're more to help the judges decide which ones to talk about and discuss.

Each judge only interviews a small fraction of the projects (less than 10%) on the one day of judging. We have 17 "appointment slots" that are 15 minutes long, and we're assigned a specific project in each slot (or we have an empty slot). This year I judged 10 projects, and last year a few more. The idea is that each project gets judged by at least 6-7 judges, and hopefully a few more, and that there's a sort of random spread of judges around the projects. I'm an electronics kind of guy, so I got all projects that had electronics involved in some way, but I only got one of the several projects dealing with antennas.

The night before

The night before judging day, we get our slot assignments, and we go out to the exhibit floor to see the project displays. Most of us have spent the afternoon looking at ALL the projects (without the finalists present), and that evening, you look especially at your assigned dozen, so that you can kind of calibrate yourself on what else is happening. There's some casual conversation among the judges about the projects, and this is where you can find out if there's something special you should be asking about the next day. I read the abstracts, look at the board, and make some notes about what I want to ask. There's a form on the display if they worked at a research institution or were working with a scientist or specialist. In those cases, I want to see what the finalist did and whether what they did was their idea, or at the direction of someone else. Working with a team is ok. Being third assistant bottle washer, not so ok.

It's also a chance to get an assignment changed: maybe you know one of the finalists, and you don't want a conflict of interest. I suspect the other reason they want us there the night before is to be able to deal with no-show judges. It happens: planes don't arrive, other commitments take precedence, people get sick or injured. When you have hundreds of judges, you can pretty much guarantee there will be some with a problem. I do not envy the fair's staff their job working all these issues. We're all volunteers (we pay our own way to the fair and for lodging, etc.) and I'm impressed at the level of commitment.

You can also do a bit of strategic googling when you get back to the hotel. I didn't google the actual finalists (some judges do, but I didn't bother), but googling to find out the "current state of the art" in the topic areas is useful. It's a poor finalist who isn't aware of what other people are doing in their field of inquiry: If I can google it, so can they.

One of the interesting aspects of judge training and orientation is that they warn us about a couple aspects of modern science fair finalists. All these kids are scary smart, and it is VERY competitive. With modern smart phones and internet access, it's entirely possible that if you ask a question about something that they don't know, they will get online and research it before the next judge comes around. The "educated BSer" problem grows by leaps and bounds. It used to be that you could "test" the finalist by asking a few key background knowledge questions to see if they knew the field, but now, if you're interviewing in the afternoon, you'll get a different answer than a judge first thing in the morning. This has always been a problem, and after judging a while, you know to ask questions that don't telegraph the correct answer

We were also warned that the finalists would google us. They get their judges' names first thing in the morning, and apparently it's now standard practice among finalists to do a quick background check on their judges. I don't know that anyone googled me, I didn't hear anything during judging that might have indicated it.

The day of judgement

We do our interviews. There's a PA that announces "now is the time when you should be interviewing the project in schedule slot N".   Periodically through the day, you fill out your scan form with you numeric score and turn them in. We have to score on a 0-100 scale, but every judge has their own scheme for this. Some judge easy, some judge hard. Some spread their scores (I do.. my scores run from 20s to 90s), some don't. A lot of judges basically score a median of 75 with 50-100 range. They've tried various training schemes, but by now, they know that doesn't work, so they have a different tabulation scheme (which I'll describe later).

The interview process is sort of ad hoc. For entrants that don't speak English, they have translators of varying proficiency, but overall I don't think anyone really suffers from being a non-English speaker. Even in English, some finalists are voluble and talkative, and others are pretty quiet. It's probably pretty overwhelming for them: they've flown half way across the world (or from across town), they're getting grilled by a dozen people over a day in 15 minute shots. It's kind of like getting interviewed for 10 different jobs in a day.   Both the judges and finalists worry about missing some essential piece of information that might make or break you. "Darn, I forgot to ask about X. I hope some other judge asked about it."

When I interview, I ask my questions, which tend to be pretty specific. Contrary to science fair lore, most judges don't start with "tell me about your project", because that starts what we call "the tape recorder". Sure, finalists all have a rehearsed capsule version (an elevator pitch, if you will). Presumably, though, they've put that on the display, so I don't want to burn valuable interview time with it. Learn it well though, you'll need it to explain to others (e.g. news media) about your project, but the judges are good at "stopping the tape"; we even get suggestions in training on how to do it, and we do compare notes about ways that work well.

I usually have picked out a few things from their display that I'm interested in, and I'll ask about them. "Why did you decide to do X?" is a question that all judges use. They want to know what you did, and that you did what you did for a reason, not just happenstance. Everyone has some sort of "origin story" for why they did that particular project, some more interesting than others. I don't know that the "my friend was injured in a motorcycle accident" or "my uncle suffered from X" works any better than "I was fooling around and noticed this odd thing happened". We're all engineers and we love to solve problems, and we want finalists to be the same. The kiss of death is "my teacher, adviser, brother's professor gave me a list of 3 projects and I chose this one". You don't get many points for creativity for "doing homework problems".

I don't score as I interview. I take notes (real important for the caucus later) about stuff that's good or bad or particularly interesting. Then, at a break, I do my scoring. I (and most judges) look at 3 or 4 projects before we do our first scoring, so we don't box ourselves in by giving the first project a 90, and it turns out to be the lamest of lame.

They provide a rubric dividing the score up into rough percentages for creativity (30%), scientific/engineering method (30%), thoroughness(15%), skill(15%), and clarity(10%). Clarity doesn't get a lot of weight directly, but hey, if I can't understand what you did and why you did it, you're not going to get great scores in the first two heavily weighted buckets. Some judges ignore the weights, some don't.    I basically start with about half the credit in each bucket, and run it up and down later, based on whether the finalist is better or worse than "average", where average is my take on the general level of competition at the fair (in the category!.. I'm not comparing against the "cure for all cancer" over in biochem, or the "solution for Fermat's last theorem" in Math).

There are certain things for which I will bomb the score in a bucket. If the project is a "do the homework assignment", you don't get points for creativity. If the project is a "fooling around in the garage with no plan" then you don't get points for method, no matter how good you are at machining. If the project is a "snow job" with lots of fancy words and no real content, well.. you can guess how well it does.

The Caucus

The caucus is the best and most powerful part of judging. We have a mini caucus a couple times during the day (before we start judging and at the lunch break), where we can write project numbers up on flip charts or a blackboard that we think are of particular interest. That's a cue to other judges to take a look at them, or pay special attention when they interview later. It turns out, though that the flip chart scores don't actually affect the final results very much, but they do prevent a sterling project from being overlooked, or a lame project about which you have subject matter expertise from getting more credit than it should.

The caucus starts after all the interviews are done around 5:30PM. The fair takes all the individual scores for a judge and rank orders them and turns them into quintile scores (i.e. high, high med, med, low med, low), which has the effect of equalizing the "spreads" and easy/hard judges. Then they sum the ranks on some basis, and produce a preliminary ranking of all the projects in the category. A project with all high quntile winds up on top, all low quintiles winds up on the bottom.

Our first task as a panel is to decide "who is in the ribbons" and who isn't. If you're not in the ribbons, it's not worth arguing about whether you are 50th place or 51st. There's a definite time factor. We have to have our placings nailed down, so the fair management can start the "best in show" judging (they work through the nite after we're done... we finish about 9PM, typically). The fair incentivizes us to finish by providing free food and (alcoholic) drinks starting around 7:30 PM. Spend too long in caucus, and all there is crumbs, empty beer bottles, and vacuum cleaners when you get down to the reception.

The first thing we do is look for outstanding projects that, for some reason, wound up in the bottom of the pile, and for lame projects that, for some reason, wound up in the top of the pile. These are generically known as "polarizing projects"... they'll get a few top quintile and a few bottom quintile rankings, and not much in the middle. If you are the lone judge who tanked a project which everyone else rated highly, you have to stand up and explain what you saw. Sometimes, it's subject matter expertise. Sometimes, it's a question you asked, that the others didn't. And it works both ways. In our category, there were maybe half a dozen projects (out of over 100) that either moved way up or way down.

Then it's a matter of negotiating the placings of the remaining projects. In our category (each caucus runs it their own way), you basically had to identify someone to "swap" with if you wanted to move a project up or down. For each proposed move, judges who ranked the project high or low stand up and give their pitch for why they thought it was good or bad, relative to the other projects. Hopefully, there's a judge or two have actually judged BOTH of the projects being evaluated. Interestingly, statistical analysis shows that there's not much value in a single judge having seen both, as compared to none having seen it.

This process changed how I took notes compared to last year, though. If you have a project that is especially good or bad, you want to be able to stand up and articulate clearly, WHY you believe it should move up or down. If a finalist is one of those projects with mostly good and one bad, or mostly bad but one good ranking, and your judge isn't willing to stand up and talk (the judging isn't blind, you can click on the score and find out who gave it), then their score gets discounted in the discussion.

As the evening wears on, some judges leave. They have planes to catch, or other things to do. So we make an effort to pick the top winners first, and the real negotiating comes in at the end: do you get third, fourth, or nothing. There's a fair amount of swapping across the bottom boundary, and practically speaking, it depends on the extemporaneous speaking skills of the judges involved, if they are present.

Finally, you have all the placings decided. You take a vote, call it done, and head off to the reception.

Then, at the reception, you talk to the other judges and rehash some of the decisions. I think that the whole caucus and post caucus discussion process is incredibly valuable. A lot of the judges are multiyear judges, and I think this is how we get to a consensus about what is good or bad, in the general sense.

Jim Lux's blog

Sunday, May 20, 2012

Judging the International Science and Engineering Fair (ISEF)

The night before

The day of judgement

The Caucus

No comments:

Post a Comment