Predicting Predictions

Here’s a tantalizing prospect. What if, amid the cacophony of online buzz — all the tweets, blogs, searches, likes, and comments showering us every day — there were enough digital clues that we could forecast the next hot product, successful startup, rising stock, or blooming celebrity?

Predicting (and cashing in on) collective behavior
The idea that online activity can be a preview of offline actions makes intuitive sense.  Consumers shopping for a new camera may seek recommendations from friends or search the Web for reviews and prices; moviegoers may tweet about films they’ve seen or plan to see; and individuals investigating vacation options may ask questions in online forums or may compare airfares. If so, measuring search counts, tweets, and mentions related to retail activity, movies, or travel should help forecasters predict collective behavior of economic, cultural, or political interest.

In recent years, a slew of research papers have appeared (one of them by us and our Yahoo! Research colleagues) to support this intuition. In parallel, startup companies such as WiseWindow, DataSift, and Gnip have begun to mine the social Web for patterns and trends that they then resell to other companies that want to track industry directions or perceptions of their brands. And then in March 2011, London hedge fund Derwent Capital Markets invested $40 million to test an Indiana University professor’s theory that the country’s mood, as reflected by language used on Twitter, could predict the stock market.

Grounds for skepticism
Amid the excitement are grounds for skepticism. The problem is, forecasts can’t really be judged without considering how hard the outcome is to predict. For example, think about predicting the weather in Santa Fe, New Mexico. A prediction that is accurate 82% of the time sounds impressive until you realize that it is sunny in Santa Fe 300 days a year, hence one can be correct 82% of the time simply by predicting sunshine every day. Given this knowledge, any meteorologist whose predictions are not at least this accurate would be considered a joke.

For the same reason, the predictive power of social media must be judged in relation to statistical models fit with traditional data sources, prediction markets, or expert opinions. For example, the number of screens a movie opens on tells you a great deal about how much money the movie will bring in; today’s stock price tells you a lot about tomorrow’s price; and many trends in social media simply reflect information that appears in other kinds of media.

The buzz on buzz
It’s not enough, in other words, to show that sentiment on Twitter is correlated with the stock market — it must contain information that can’t be obtained from any other source, like studying economic indicators or reading the paper or watching the market itself. Whether this is true remains to be seen.

All reservations notwithstanding, we predict that the buzz on buzz will grow in 2012. Since at least the Oracle of Delphi, scientists and charlatans alike have been trying to predict the future. In our view, the temptation to find in social media the elusive crystal ball will prove simply too strong to pass up.

David Pennock and Duncan Watts are both principal research scientists at Yahoo! Labs. Watts is the author, most recently, of “Everything is Obvious*: *Once You Know The Answer.” Pennock is a regular contributor to The Signal, a Yahoo! blog about political predictions.

Photo by The Planet/Flickr