informative fast-paced

This was a good book highlighting a lot of the good and interesting work being done with large modern datasets. I think the ideas behind it are very important to our world. At times the author is bogged down by too many examples and spends a little too much time on quirky results, but the work is interesting and timely.

So fascinating. Presents such rich and interesting data in an easy to consume way.
A lot of books based solely on economic data can be absolute snooze fests but the author was engaging and kept me turning the pages. Stats are unlikely that somebody reading a book written by an economist will finish the book (learnt that in the book) , but I read every damn word!!
informative

I really liked it to! I want more though! I felt like it referenced a lot of interesting things about human online behavior but I wanted it more studies/findings.

The sections of this book that dealt with the subject of a persons private thoughts,dark and morbid as they may be,were comforting because they reveal that even outwardly well adjusted people google exactly what you expect them to,when they think they're not being watched.

What was more interesting to me was the story of Jeff Seder,the man who discovered after thousands of hours of working with horses,that 1 in every 10,000 horses has an abnormally large left ventricle.
One horse in particular,called American Pharoah, has a left ventricle in the 99.6th percentile. This allowed American Pharoah to achieve some of the most prestigious awards in the world of horse racing.

This book stresses that people are more true to themselves online than they will ever be in person. It also tells us that every online activity we partake in reveals a little bit about our personalities and can be exploited for profit by companies who have us "figured out"
informative fast-paced

Oh my dear lord, this book. This should have been a great book, one in which Stephens-Davidowitz explains how big data is changing the kind of research we can do (it is), peppers with a couple of key insights, and then maybe, just maybe, discusses some of the problems with it. And to be fair, the book kinda does do all those things. But it is somehow all built through a lens of the hero's journey, where Stephens-Davidowitz is the hero, big data/Google Trends is the all-powerful weapon he finds, and a complete understanding of the world as we know it is the prize. He consistently overstates his findings, minimises the issues raised by this technology, and frequently simply ignores basic research practices. While he takes great patronising care to explain the difference between correlation and causation, he frequently overlooks it in his own analysis. In short, much of the book is not justifiable, even though what it is discussing is very important. Also, in case you can't tell, I found the author irritating as hell.
Stephens-Davidowitz starts the book by talking about the amazing insights we can draw from Google Trends data, which looks - aggregated - at what people are searching for in Google. He makes big claims for this data, and for the potential, they should be justified. Aggregated data tracking lets us know a huge amount of how people behave, not just how they say they behave. But the big insights he pulls out? Women worry about their genitalia almost as much as men do; people have *waaay* less sex than they claim to; incest porn (for men) and rape porn (for women) is more popular than polite society is prepared to admit; overtly racist folks still abound; and covert sexism is pervasive even in liberal demographics. Bored yet? There is more: would you believe that many women tell Google that they regret having children, than those who regret not having them?
See, the thing is, none of these things would surprise experts in the field (even the extensive porn bits wouldn't be telling Dan Savage anything new). The last one wouldn't surprise any sleep-deprived parent at all. Which brings me to the issue with Stephens-Davidowitz' analysis here. See, he just assumes that people tell the truth to Google. And because of this, he assumes that this search trend means many women regret having kids. He doesn't consider how many of those confessions might have been made by a mother holding a phone in one hand and a baby who hasn't stopped screaming for hours with the other. There is evidence that many women regret having children, but you need more than a single source to come to a conclusion about what is motivating behaviour: and typing into Google is behaviour, not a direct transmission of thoughts.
The thing is, while none of these insights by themselves is that revelatory, this body of research is incredibly significant. We've never had access to so much behavioural data before. But it is still one input, it has to be assessed within the light of context and input from other data.
In the detail of the book, Stephens-Davidowitz acknowledges this. After leading the book with the Google Flu project, designed to track an epidemic via searches, he admits that it doesn't yet work as well as conventional methods (which by the way, set a low bar). The project to track world economic movements by measuring night light over developing cities also came out no more accurate than current methods, but when combined with current methods: much better.
And this is the point really. Data science is just more methods for old science - or humanities - research. Historians use textual analysis to reveal changing attitudes, building upon the methods they use for in-depth research of particular texts. They are still historians, with detailed training in a domain topic. Data analysis is a method, not a new kind of science. Stephens-Davidowitz shows no awareness of this, instead writing statements like: "I have not yet been able to use all this unprecedented data on adult sexuality to figure out precisely how sexual preferences form." and [with data science] "social science is becoming real science". 
Possibly an even bigger issue is a lack of discussion about why this data is being collected, who is analysing it and why. Stephens-Davidowitz is positively cloying about Google as an employer, at one point wondering: "How can one of the biggest and most competitive tech companies in the world seemingly be so relaxed and generous?" (The answer, by the way, seems to be by being so smart you can afford to be wonderful). There is no hint of the criticisms of the work culture in Silicon Valley here. More significantly, Stephens-Davidowitz doesn't seem to understand, or certainly discuss, the difference between publicly owned and funded research directed toward public good, and marketing exercises designed to increase profit, even at the expense of stability. He discusses A/B market testing at length, and explains it well, but also describes it as a form of double-blind testing, completely ignoring the framework and purpose of the exercise. This is also where some of the confusion regarding cause/correlation comes in as well. For much of what data scientists are trying to do, the `why` is irrelevant, knowing two factors correlate is enough to predict, understanding is another thing altogether. Stephens-Davidowitz blithely states instead: "when trying to make predictions, you needn’t worry too much about why your models work.". 
I wanted this book to be as good as [a:Cathy O'Neil|6928121|Cathy O'Neil|https://images.gr-assets.com/authors/1514301784p2/6928121.jpg]'s [b:Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy|28186015|Weapons of Math Destruction How Big Data Increases Inequality and Threatens Democracy|Cathy O'Neil|https://images.gr-assets.com/books/1456091964s/28186015.jpg|48207762]. It wasn't even close, but it is still one of the few joyful explorations of the kind of techniques that might change the way we understand human nature.



This book is in that particular genre where the author tries to make his or her area of expertise (often physics for some reason, though clearly not in this case) palatable and accessible to the "common (wo)man." These types of books fail when the author doesn't dumb it down enough or dumbs it down too much. Stephens-Davidowitz's area is economics/social science by way of Big Data, and he dumbs it down just the right amount.

At the beginning of the book, my inner skeptic was anxiously asking about correlation vs causation and how people can know they're asking the right questions of the right data. By the end of the book, Stephens-Davidowitz had satisfactorily addressed most of my initial concerns and provided some insight into data science, social science, and some aspects of human nature along the way. Plus, the book made me laugh (well, chuckle) out loud more than a few times, which means I was pretty engaged and is not bad for a book about data science.

Some notes:
- The subtitle ("Big Data, new data, and what the internet can tell us about who we really are") is slightly misleading. While much of the book does rely on search queries (predominately Google) and Twitter and Facebook updates, plenty of the analysis and studies rely on non-internet data sources. Stephens-Davidowitz is clearly excited about all of the new ways to use all of the new internet data, but the overall focus of the book is on Big Data of all kinds and its powers and drawbacks.
- Some chapters illustrate the fact that people admit things on the internet they would not admit elsewhere. Issues addressed include porn preferences and racism, both discussed in detail, and child abuse, suicide, and similar, discussed in less detail. Although the possible conclusions range from unsavory to downright depressing, the topics are relevant to addressing the book's points about data and social science; however, worth noting because some readers will be sensitive to these topics.

(Thank you, Dey Street Books and GoodReads for the ARC.)

I found the first half really interesting. It puts out a lot of fun statistics that make you think back to your life and experiences and you sorta go “ahh”. The second half was boring as heck and I had to force myself to finish. I’m not convinced I even did.