Early results for Charley and Frances
What a week we had! We had envisioned many classifications, but received so many more! So far we have received more than 11,000 classifications from nearly 2000 users in June. These storms had never been analyzed on CycloneCenter and Hurricane Charley was completed on the first day! Hurricane Frances is nearly complete now. We will likely have more completely new storms this month.
There are numerous crowdsourced science projects out there and each have the same goal: To better understand an issue (hurricanes, bats, animal populations, etc.) based on input from numerous clicks and selections from citizen scientists. In addition to the Zooniverse, there are other crowdsourced projects. The concept of learning from a crowd is not new. There are many mathematical and statistical papers available that provide a means to accurately learn the best possible answer based on everyone’s input.
In our analysis, we have used an approach to estimate a probability of a selection based on the selections from individuals, given what those individuals tend to select. It is a pretty complex algorithm that took me a while to understand, so I won’t belabor the point, but provide some links to the papers below. The method described by Raykar et al. is an Expectation Maximization (E-M) algorithm.
Our initial analysis is looking at what type of storm is the cyclone based on the broad categories available: No storm, Curved band, Embedded Center, Eye, Shear or Post tropical. Later, we plan to use this information to estimate of the storm’s intensity.
Hurricane Charley was relatively short-lived: only 6 days so only about 48 images. This means it was completed relatively quickly, contrast that with Frances which has nearly 150 images.
The following graphically denotes the basic selections for Hurricane Charley. The selections (or votes) by citizen scientists are denoted in the lower graph. Each column is the selections for a given image of a storm. The percentages show what fraction of the citizen scientists selected for an image. The upper graph denotes the probability of the image type based on the selections and the tendencies of the citizen scientists. These are most often 100% of one type, but can sometimes be a “toss-up” (i.e., no clear winner such as the case in the first two images of Charley).
Also, there is quite a bit of variance in the selections and no clear time period when the storm had an eye. This is partly an artifact of the satellite imagery. Each pixel is about 8km while operational data available to forecasters can be as high as 1 km for each pixel. Such resolution helps identify small eyes.
Even while Hurricane Frances is available for classifying, the early results are very good. They show a bit more consistency in the selections. Since it isn’t done yet, there are some images with less than 10 classifications, but it looks consistent so far.
The graph shows large agreement in storm type at various stages of hurricane development. The storm rapidly developed an eye by about day 3. It maintained an eye more most of the time between day 4-9. Then the primary type became embedded center with some selections of other types (e.g., shear). By day 12, the storm had begun to dissipate and was largely being classified as post-tropical or No storm.
Most of the users this month are new so these results certainly aren’t final. The learning algorithm needs lots more samples from all the new classifiers to more accurately understands their tendencies. As time goes on and those who were active on these storms classify other storms, the E-M algorithm will refine this storm.
Nonetheless, the results are very encouraging. In fact, we’ve made more than 180 of these plots for all storms that are complete (or nearly complete). The next step will be to further analyze the results and see how best to estimate storm intensity from these classifications.
The following papers were crucial in our initial analysis of the CycloneCenter data.
Learning from crowds 2010: VC Raykar, S Yu, LH Zhao, GH Valadez, C Florin, L Bogoni, L Moy, The Journal of Machine Learning Research 11, 1297-1322
This article is the basis for our current algorithm. At first I used the binary approach to determine which images had eyes. Then I applied the multi-class approach (section 3) for all storm types.