With The Next HOPE less than a week away, it’s great to look back at The Last HOPE and the data that was collected during it. This article will explore what was released to the public and some of the cool data mining that is possible with this data.
Getting the data:
To start we can obtain this data from the Crawdad wireless dataset project. The Last HOPE data can be found at http://crawdad.org/meta.php?name=hope/amd
In browsing the dataset we will concentrate on different files as we need to relate users in different ways. As this data is heavily normalized, we will usually need two or three files to get at the relationships we are interested in.
Some of the more interesting csv files are:
- creation.csv – This file records when a user created an account, and which registration code they used.
- person.csv – This is the profile record. It contains the user’s handle, their stated age, gender, location, cell provider, etc. None of this data is verified or validated. This is data is useful for labeling and getting beyond user ids. Many of these handles are designed to be recognized. Mining forums for these handles could tell us more about the user.
- ping.csv – This file records everybody that pinged somebody else and when they did it. Think of this like Facebook’s “poke” feature.
- position_snapshot.csv – This file records what zone each user is twice a minute.
- talk_presense.csv – This file joins user locations with where talks were being held to infer which talks a user attended. This can be further used with talks.csv to link users with talk descriptions and interests associated with a talk.
.
Check out http://sunnythinking.org/blog/mining_thelasthope.html for the rest of this analysis.