2.25

Oh my dear lord, this book. This should have been a great book, one in which Stephens-Davidowitz explains how big data is changing the kind of research we can do (it is), peppers with a couple of key insights, and then maybe, just maybe, discusses some of the problems with it. And to be fair, the book kinda does do all those things. But it is somehow all built through a lens of the hero's journey, where Stephens-Davidowitz is the hero, big data/Google Trends is the all-powerful weapon he finds, and a complete understanding of the world as we know it is the prize. He consistently overstates his findings, minimises the issues raised by this technology, and frequently simply ignores basic research practices. While he takes great patronising care to explain the difference between correlation and causation, he frequently overlooks it in his own analysis. In short, much of the book is not justifiable, even though what it is discussing is very important. Also, in case you can't tell, I found the author irritating as hell.
Stephens-Davidowitz starts the book by talking about the amazing insights we can draw from Google Trends data, which looks - aggregated - at what people are searching for in Google. He makes big claims for this data, and for the potential, they should be justified. Aggregated data tracking lets us know a huge amount of how people behave, not just how they say they behave. But the big insights he pulls out? Women worry about their genitalia almost as much as men do; people have *waaay* less sex than they claim to; incest porn (for men) and rape porn (for women) is more popular than polite society is prepared to admit; overtly racist folks still abound; and covert sexism is pervasive even in liberal demographics. Bored yet? There is more: would you believe that many women tell Google that they regret having children, than those who regret not having them?
See, the thing is, none of these things would surprise experts in the field (even the extensive porn bits wouldn't be telling Dan Savage anything new). The last one wouldn't surprise any sleep-deprived parent at all. Which brings me to the issue with Stephens-Davidowitz' analysis here. See, he just assumes that people tell the truth to Google. And because of this, he assumes that this search trend means many women regret having kids. He doesn't consider how many of those confessions might have been made by a mother holding a phone in one hand and a baby who hasn't stopped screaming for hours with the other. There is evidence that many women regret having children, but you need more than a single source to come to a conclusion about what is motivating behaviour: and typing into Google is behaviour, not a direct transmission of thoughts.
The thing is, while none of these insights by themselves is that revelatory, this body of research is incredibly significant. We've never had access to so much behavioural data before. But it is still one input, it has to be assessed within the light of context and input from other data.
In the detail of the book, Stephens-Davidowitz acknowledges this. After leading the book with the Google Flu project, designed to track an epidemic via searches, he admits that it doesn't yet work as well as conventional methods (which by the way, set a low bar). The project to track world economic movements by measuring night light over developing cities also came out no more accurate than current methods, but when combined with current methods: much better.
And this is the point really. Data science is just more methods for old science - or humanities - research. Historians use textual analysis to reveal changing attitudes, building upon the methods they use for in-depth research of particular texts. They are still historians, with detailed training in a domain topic. Data analysis is a method, not a new kind of science. Stephens-Davidowitz shows no awareness of this, instead writing statements like: "I have not yet been able to use all this unprecedented data on adult sexuality to figure out precisely how sexual preferences form." and [with data science] "social science is becoming real science". 
Possibly an even bigger issue is a lack of discussion about why this data is being collected, who is analysing it and why. Stephens-Davidowitz is positively cloying about Google as an employer, at one point wondering: "How can one of the biggest and most competitive tech companies in the world seemingly be so relaxed and generous?" (The answer, by the way, seems to be by being so smart you can afford to be wonderful). There is no hint of the criticisms of the work culture in Silicon Valley here. More significantly, Stephens-Davidowitz doesn't seem to understand, or certainly discuss, the difference between publicly owned and funded research directed toward public good, and marketing exercises designed to increase profit, even at the expense of stability. He discusses A/B market testing at length, and explains it well, but also describes it as a form of double-blind testing, completely ignoring the framework and purpose of the exercise. This is also where some of the confusion regarding cause/correlation comes in as well. For much of what data scientists are trying to do, the `why` is irrelevant, knowing two factors correlate is enough to predict, understanding is another thing altogether. Stephens-Davidowitz blithely states instead: "when trying to make predictions, you needn’t worry too much about why your models work.". 
I wanted this book to be as good as [a:Cathy O'Neil|6928121|Cathy O'Neil|https://images.gr-assets.com/authors/1514301784p2/6928121.jpg]'s [b:Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy|28186015|Weapons of Math Destruction How Big Data Increases Inequality and Threatens Democracy|Cathy O'Neil|https://images.gr-assets.com/books/1456091964s/28186015.jpg|48207762]. It wasn't even close, but it is still one of the few joyful explorations of the kind of techniques that might change the way we understand human nature.