May 10, 2014

Advice from Hal Varian to Econ Grad-Students

Wikimedia Commons
From an interesting and challenging article by Hal R. Varian:
In fact, my standard advice to graduate students these days is go to the computer science department and take a class in machine learning.
He gives interesting examples of techniques that can help analyse big data and their relevance for economics. He  explains:
Google has seen 30 trillion URLs, crawls over 20 billion of those a day, and answers 100 billion search queries a month... At Google, for example, I have found that random samples on the order of 0.1 percent work fine for analysis of business data. (p. 3)
An important insight from machine learning is that averaging over many small models tends to give better out-of-sample prediction than choosing a single model. p. 24 
An example
In 2006, Netflflix offered a million dollar prize to researchers who could provide the largest improvement to their existing movie recommendation system. The winning submission involved a “complex blending of no fewer than 800 models,” though they also point out that “predictions of good quality can usually be obtained by combining a small number of judiciously chosen methods” (Feuerverger, He, and Khatri 2012). It also turned out that a blend of the best- and second-best submissions outperformed either of them.
Good reading suggestions are in the final summary of the article. 

No comments:

Post a Comment