May 2010
1 post
Review: Malcolm Gladwell's "Outliers"
Malcolm Gladwell’s Outliers is a collection of anecdotes about successful human careers, ranging in domain from computer science to hockey to the law. In these episodes he reveals hidden environmental precursors to eventual success or failures. An example is that of hockey players in the elite Canadian junior leagues, who are far more likely to have been born in the first few months of the year...
April 2010
1 post
May 2009
1 post
Can Lexicon predict unemployment trends?
Haven’t dug too deep into this dataset but I thought these were interesting. Looks like Lexicon begins to “overshoot” in late January…perhaps because the phrase “laid off” refers not only to new layoffs, but also layoffs that happened in the previous months.
I’m sure some hedge fund can use this data to make some quick chalupes before the weekly Bureau...
March 2009
1 post
Gallup's Mood-Tracker
Check out the Gallup Daily US Mood Tracker: http://www.gallup.com/poll/106915/Gallup-Daily-US-Mood.aspx
This chart comes from the Gallup-Healthways Well-Being Index, from a poll of Americans a day (claiming 98% coverage.) The survey contains questions about health, diet, well-being, stress, and economic indicators (“Although it’s not very likely that you did, could you tell me if you...
February 2009
1 post
Theme extraction wrong
Following up on the New York Times rant (I only knock it because I love it), here’s a look at Time. To boost pageviews on Time.com, they elect to insert internal links right within the content of the page. To find a relevant link to show, they use some sort of theme extraction algorithm on the paragraph and search for articles that also contain that theme.
The article is a satiric list of...
January 2009
3 posts
Do NYTimes.com readers actually read the news?
1. pop health article about coffee. NOT NEWS 2. sexy pop psychology article. NOT NEWS 3. republican-bashing op-ed. NOT NEWS 4. article about profanity on signs (“butt hole road”, “crapstone, england.”) NOT NEWS 5. article about nationalization of banks. NEWS 6. empty personal finance editorial. “participants in 401(k)’s are in greater danger than ever of...
November 2008
1 post
Text analytics for democracy
The Obama transition team recently put up a web site, change.gov, that features several contact points for Americans to provide feedback directly to the future administration. One is called “Share Your Story” and another is “Share Your Vision”:
Share with us your concerns and hopes. – the policies you want to see carried out in the next four years.
There are also...
October 2008
5 posts
Primary sources
For most people trying to sell text analysis to marketers, “social media” usually means two things: blogs and Twitter. Why those two, out of all the “social” text on the internet? Let’s go through all the possibilities.
- MySpace
advantages: target audience, public profiles
disadvantages: mostly spam, false metadata (“101 years old”), many profiles...
Predicting polls with Lexicon
With Facebook Lexicon we’ve been able to aggregate lots of public and semi-public conversations taking place between lots of different types of people in the US. Several gigabytes of raw text goes through the Lexicon system every day. It’s a lot of stuff to churn through, and we couldn’t do it without the use of our trusty Hadoop cluster.
Back around February I started to get...
happy factor
Two friends and I are working on a fun side project called HappyFactor where we randomly send text messages to people throughout the day and ask them how happy they are, and what they are doing.
http://www.happyfactor.com
The idea is threefold. First, just by having people think about their happiness in their current context, they can develop mindfulness and increased self-awareness. Look down...
determining media bias
Brendan’s got an interesting analysis of Josh Schachter’s (del.icio.us founder) Firefox extension that color-codes media sources by their bias.
http://anyall.org/blog/2008/10/it-is-accurate-to-determine-a-blogs-bias-by-what-it-links-to/