A review by cade
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are by Seth Stephens-Davidowitz

3.0

This book presents some interesting results from an exciting field with a lot of potential. However, I think the author thinks a little too highly of his field. I think he spends too long trying to convince the reader of certain ideas about big data that are pretty obvious to everyone (at least to anyone likely to be attracted to a book about data analysis in the first place). I think most people immediately intuit the concept of using big data without a condescending encouragement that we shouldn't be intimidated by the big scary subject that the author is a master of. For example, he explains that Facebook posts/likes are not necessarily representative of reality because people sometimes (gasp) try to post things that will impress their friends. I think we all already knew that. He says that comparing the salaries of Harvard graduates with salaries of another school is not a good way to tell how much going to Harvard improves your earning potential. Again, that should be obvious to any thinking person.

Despite going to some length to elucidate these obvious potential pitfalls of data analysis, he fails to adequately address other potential shortcomings. In particular, he extols at some length the virtue of Google search data, specifically the honesty. While Google data is probably free of some systematic problems with traditional (i.e. survey & testing) data, I do not think the author adequately considers alternative motives/meanings for some of the searches he discusses. If you limit your Google searches to things in the form of a certain question (e.g. "Is my husband ________"), are you getting skewed data because a certain type of person queries in explicit questions instead of as keywords (e.g. "________ husband"). He chuckles at some personal questions people ask that Google can't possibly answer (e.g. "How tall am I") but fails to seriously discuss what rational purpose a person might have in mind when typing that.

The author seems to be an astute economist, and he is clearly introspective about some larger issues in his field. I wouldn't be surprised if he actually has a more sophisticated and nuanced view of the strengths and limitations of the data and has more rigorously considered certain seeming deficiencies in the studies that he describes. However, even if we give him the benefit of the doubt to say he thinks about them but doesn't put them in the book, that feels condescending, as if he doesn't think the reader is sophisticated enough to consider it.