A review by blackshirt
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are by Seth Stephens-Davidowitz

3.0

This is worthwhile read although findings if not the methodology will likely be outdated within 10 years. Stephens-Davidowitz tells a story for a general audience. It's pretty simple. We, in some ways and to an extent, reveal true things about ourselves in our Google queries.

Men and women ask Google about their physical and relationship insecurities. Men are often concerned with their sexual performance duration, their penis size, and whether their significant other is cheating. Women are often concerned by the smell of their vaginas and why their significant others aren't having sex with them.

Both sexes, at least in our current time period, are interested in incestual porn on PornHub and some women show an interest in rape fantasies on PornHub. Here I should mention that the author rarely uses absolute numbers, but reports rates and comparisons to the opposite sex. So, don't be overly concerned about the prevalence, but you should be aware of the existence. Especially because these are taboo subjects.

He finds that rates of out gay men seem lower than expected in certain conservative and rural parts of the country, but that the missing gay men reappear in search queries. The same goes for the supposedly non-existent gay men in Iran and Sochi, Russa. Given this data, Seth expects about 5% of men to be gay.

Kids ask Google about dealing with abusive parents, especially in the aftermath of the 2008 recession. Women in states with restricted access, ask Google about DIY abortions. Racists search for racist jokes or combine a racial category with a strong negative adjective.

Some of the story confirms our cultural understanding of how the world works. Some of our cultural stories are disconfirmed. These, of course, interest me the most.

He has some warnings against how to use big data, and this data in particular, for individuals, businesses, and governments. For example, governments should be extremely cautious about trying to use it to predict or prevent crimes by an individual for compelling statistical and ethical reasons. A good use would allocate resources to a geography area, such as gay support groups expanding their presence to certain areas or government child services to others where abuse search terms are increasing. I was fascinated that racist search terms spiked during and after a President Obama speech haranguing Americans to be more tolerant, but a later speech that simply told stories of non-white Americans as soldiers, doctors, teachers, engineers, etc was associated with many fewer racist searches.

Two obvious pitfalls to be aware of when using this data. First, these queries are not directly measuring the things we are interested in - they are proxies. So, there's risk of conflating what we can measure with the truth. Two, since predictability does not always lead to explanatory theory, we risk a host of misunderstandings, such as the cause-effect relationship or the turkey problem. The turkey problem is the Thanksgiving turkey who lives a great life and continues to belief it will live a great life until, unbeknownst to it, a certain holiday arrives on the calendar.