February 17, 2009

Theme extraction wrong

Following up on the New York Times rant (I only knock it because I love it), here’s a look at Time.  To boost pageviews on Time.com, they elect to insert internal links right within the content of the page.  To find a relevant link to show, they use some sort of theme extraction algorithm on the paragraph and search for articles that also contain that theme.

The article is a satiric list of why elderly people like Facebook: Why Facebook is for Old Fogies.  There were a total of 3 paragraphs out of 11 that had links within the content.  Here’s the first one:

1. Facebook is about finding people you’ve lost track of. And, son, we’ve lost track of more people than you’ve ever met. Remember who you went to prom with junior year? See, we don’t. We’ve gone through multiple schools, jobs and marriages. Each one of those came with a complete cast of characters, most of whom we have forgotten existed. But Facebook never forgets. (See the best social networking applications.)

The extracted theme is Facebook, and the target page is an article about social networking.  Seems relevant enough.

3. We never get drunk at parties and get photographed holding beer bottles in suggestive positions. We wish we still did that. But we don’t. (See pictures of Denver, Beer Country.)

The extracted theme is “beer” and the target is a slideshow of microbreweries in Denver.  Generously, the connection is tenuous.  The algorithm failed to take into account the negation “never…get photographed holding beer bottles…” and so the added content looks random.

6. We’re old enough that pictures from grade school or summer camp look nothing like us. These days, the only way to identify us is with Facebook tags. (See pictures of a diverse group of American teens.)

The extracted theme is “school” or “kids,” I suppose.  Somehow that connects it to a series of photos of random people talking about themselves.  The problem here is that the tags are too broad.  You could tag almost any content with “living people” or “earth” or “published in 2009” and make these coarse connection, but at the risk of confusing users.

On the other hand, the slideshow was oddly engaging….some natural human voyeurism I suppose, descended from monkeys peering through the brush to discover who was grooming whom.

The lessons here are:

  1. Don’t classify it unless you’re sure.  Theme extraction ain’t easy.
  2. Be smart about simple things like negation.
  3. If you need to, show something broadly interesting…preferably photos, which are consumed quickly and generate tons of page views (== $).
  4. Content and advertising are often the same.
Comments (View)
blog comments powered by Disqus

The opinions expressed on this site are mine and do not necessarily represent those of my employer, Facebook. You won’t find any confidential company information here, and while you’re welcome to get in touch with me, I’m afraid I can’t put you in contact with my employer.