Who is talking about LOST and where?
I must admit that I can’t answer WHY people are talking about Lost!
Since LOST concludes its six year run tomorrow night, I wondered, “Who is talking (by talking, I mean Twitter Tweets) about LOST the most today?” (data as of May 22, 2010) The talk about LOST is picking up in intensity (82% more today than a month ago).
- 58% are men and 42% women
- 35-44 are the biggest age group (16/2%), followed closely by 45-54 (15.9%) and 25-34 (15.3%) groups
- Hawaii (7.6%) and North Dakota (3.5%) lead the U.S.
Granted this is a search on Twitter for the people that are using the word lost and not all of these may be referring to the tv show. Perhaps more people get lost in Hawaii and North Dakota than in other States? No offense intended!
Where did I get this data? From the Lexicalist Website (a demographic dictionary of modern american english)
I Read about Lexicalist from the Visual Thesaurus Website, which I have subscribed to for a couple of months (great for writers) The following comes from a guest post by David Bamman on the languagelog blog if you want to read about how and why he is running Lexicalist.
The goal of the Lexicalist project is to develop a dictionary that depicts, in real time, the changing demographics of English in the United States, a dictionary that supplements the fundamental meaning of a word or phrase with the current cultural backdrop that’s informing its use today.
What did Bamman discover about Twitter?
- Twitter is “the language of how millions of people across the world talk to their friends.”
- Twitter is colloquial
- As of April 2010, Twitter had approximately 106M registered users
- Twitter is a rich data source for inducing the demographics of that language community.
- Geographic information embedded in each tweet allows us to map language use across the US
- He throws away all tweets where we aren’t over 99% sure of the physical location
- One can map the usage of words and phrases across the US by normalizing each word’s count by the volume of total data coming out of each state. Comparing these resulting ratios allows us to get a demographic picture of word use across the US.
- The data allows us to detect regionalisms in slang as well.
- Age and gender data can be approximated on a large scale using common demographic indicators such as the user’s first name
- How do they do this? “Compute a probability distribution for the entire age range between 12 and 75 and increment the weight count of each word according to this distribution.”