May 2010
1 post
Review: Malcolm Gladwell's "Outliers"
Malcolm Gladwell’s Outliers is a collection of anecdotes about successful human careers, ranging in domain from computer science to hockey to the law.  In these episodes he reveals hidden environmental precursors to eventual success or failures.  An example is that of hockey players in the elite Canadian junior leagues, who are far more likely to have been born in the first few months of the year...
May 2nd
8 notes
April 2010
1 post
Apr 3rd
1 note
May 2009
1 post
Can Lexicon predict unemployment trends?
Haven’t dug too deep into this dataset but I thought these were interesting.  Looks like Lexicon begins to “overshoot” in late January…perhaps because the phrase “laid off” refers not only to new layoffs, but also layoffs that happened in the previous months. I’m sure some hedge fund can use this data to make some quick chalupes before the weekly Bureau...
May 12th
3 notes
March 2009
1 post
Gallup's Mood-Tracker
Check out the Gallup Daily US Mood Tracker: http://www.gallup.com/poll/106915/Gallup-Daily-US-Mood.aspx This chart comes from the Gallup-Healthways Well-Being Index, from a poll of Americans a day (claiming 98% coverage.)  The survey contains questions about health, diet, well-being, stress, and economic indicators (“Although it’s not very likely that you did, could you tell me if you...
Mar 3rd
1 note
February 2009
1 post
Theme extraction wrong
Following up on the New York Times rant (I only knock it because I love it), here’s a look at Time.  To boost pageviews on Time.com, they elect to insert internal links right within the content of the page.  To find a relevant link to show, they use some sort of theme extraction algorithm on the paragraph and search for articles that also contain that theme. The article is a satiric list of...
Feb 17th
1 note
January 2009
3 posts
Do NYTimes.com readers actually read the news?
1. pop health article about coffee. NOT NEWS 2. sexy pop psychology article. NOT NEWS 3. republican-bashing op-ed. NOT NEWS 4. article about profanity on signs (“butt hole road”, “crapstone, england.”) NOT NEWS 5. article about nationalization of banks. NEWS 6. empty personal finance editorial. “participants in 401(k)’s are in greater danger than ever of...
Jan 27th
3 notes
Jan 4th
Jan 3rd
1 note
November 2008
1 post
Text analytics for democracy
The Obama transition team recently put up a web site, change.gov, that features several contact points for Americans to provide feedback directly to the future administration.  One is called “Share Your Story” and another is “Share Your Vision”: Share with us your concerns and hopes. – the policies you want to see carried out in the next four years. There are also...
Nov 21st
October 2008
5 posts
Primary sources
For most people trying to sell text analysis to marketers, “social media” usually means two things: blogs and Twitter.  Why those two, out of all the “social” text on the internet?  Let’s go through all the possibilities. - MySpace advantages: target audience, public profiles disadvantages: mostly spam, false metadata (“101 years old”), many profiles...
Oct 31st
1 note
Predicting polls with Lexicon
With Facebook Lexicon we’ve been able to aggregate lots of public and semi-public conversations taking place between lots of different types of people in the US.  Several gigabytes of raw text goes through the Lexicon system every day.  It’s a lot of stuff to churn through, and we couldn’t do it without the use of our trusty Hadoop cluster. Back around February I started to get...
Oct 22nd
9 notes
Oct 21st
happy factor
Two friends and I are working on a fun side project called HappyFactor where we randomly send text messages to people throughout the day and ask them how happy they are, and what they are doing. http://www.happyfactor.com The idea is threefold.  First, just by having people think about their happiness in their current context, they can develop mindfulness and increased self-awareness.  Look down...
Oct 16th
6 notes
determining media bias
Brendan’s got an interesting analysis of Josh Schachter’s (del.icio.us founder) Firefox extension that color-codes media sources by their bias. http://anyall.org/blog/2008/10/it-is-accurate-to-determine-a-blogs-bias-by-what-it-links-to/
Oct 16th