Reviews - Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are

662 reviews for:

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are

Seth Stephens-Davidowitz

3.82 AVERAGE

bisthesu's review

2.0

Not very cohesive, jumped around between unrelated data.

sofiaesquivelp's review

2.5

informative reflective slow-paced

Lots of politics and information is based in United States. Still, it had interesting data.

keerthanakanchi's review

4.5

challenging funny informative reflective medium-paced

camerawoman's review

5.0

funny hopeful informative inspiring fast-paced

My favorite economics book I think about it all the time

bookmeister's review

5.0

Great to the very last page. Yes I finished it Seth.

I liked this book. It was all good until the conclusion where he rambles on about how hard it is to finish a book and how he's not getting laid and god knows what else. Then throws out some statistics about how an insignificant number of readers actually finish a book, so why bother editing this shit. Way to condescend to your audience, while at the same time neglecting them. So how's this: Fuck you, Seth Stephens-Davidowitz. I hope you never get laid.

pennyriley's review

4.0

Full title: Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are. Stephens Davidowitz is a data scientist with a PhD in economics from Harvard and having worked at google. The big data he uses is largely from google, Facebook, and Pornhub. It’s easy these to days to get informations on likes and searches, and much, much more. There is so much data that is so easy to access, presumably in many cases more honest thatn self-reported data and easy to use for randomised A/B trials. Much of what he uncovers is surprising (Obama’s speech well received by pundits and the press seeking to minimise hate crimes and speech, actually inflamed racists as shown by the number of race hate searches immediately after the speech, for example) Other findings were much less surprising (how often you have sex is overreported). It mines the data to discover how important race was as a factor in both the Obama and Trump elections. He clearly and carefully explains the difference between correlation and causation and how difficult it can be to determine which you are dealing with. Stephens Davidowitz shows how valuable using such big data can be, while admitting that the data set is not necessarily perfect (how many searches have you made motivated by idle curiosity rather than a genuine search for answers?) but given the sheer amount of data is confident of his results. An interesting read, sometimes amusing, sometimes deeply serious.

rjsthumbelina's review

5.0

Nonfiction palate cleanser. Really interesting look at big data, how the internet is a wealth of information for those who want to study human behavior and trends. Some definitely creepy info about how companies (like Google and FB) run experiments to see what we like better all the time without our knowledge. And contains a great discussion of morality in data collection, as well as an easily understood explanation of correlation vs causation

icywaterfall's review

4.0

People’s search for information is, in itself information. By searching for what people searched for, Google can highlight trends that might not be picked up anywhere else. The power of Google is that people tell the giant search engine things they might not tell anyone else. This is Big Data, data that concerns millions of people, and cannot be seen from the ground, as it were.

- Good data science is surprisingly intuitive. At its core, data science is about spotting patterns and predicting how one variable will affect another. So if humans are naturally data scientists, and data science is intuitive, why do we need computers and statistical software? Sometimes there is insufficient experience for our unaided gut to draw upon. Also, gut instincts may give a general sense of how the world works, but data helps us to sharpen the picture. Further, our intuition alone is subject to certain biases that may be unseen to us; we tend to exaggerate the relevance of our own experience, overestimate the prevalence of anything that makes for a memorable story, etc.

- There are four unique powers of Big Data. There are many unique data sources that give us windows into areas about which we could previously just guess. Offering up new types of data is the first power of Big Data. Secondly, Big Data allows us to finally see what people really want and really do, not what they say they want and say they do. Providing honest data is the second power. Because there is now so much data, there is meaningful information on even tiny slices of a population. Allowing us to zoom in one small subsets of people is the third power. Big Data further allows us to undertake rapid, controlled experiments. This allows us to test for causality, not merely correlations. Allowing us to do many causal experiments is the fourth power.

- THE FIRST POWER: this revolution is less about collecting more and more data, but collecting the right data. Example time: historically, people have believed that the best way to predict whether a horse will win a race has been to analyse his pedigree. While pedigree does matter, it can still only explain a small part of a racing horse’s success. Horse agents do use other information to see which horses might possibly win future races; they might analyse the gaits of horses and examine them visually. Jeff Seder was never interested in the traditional methods of evaluation; he cared about Data. Seder decided to measure the size of winning horses internal organs and found that the size of the heart was a massive predictor of a horse’s success; and this was a better predictor for success than the previous techniques employed. If you want to predict the future, you don’t have to worry about why your model works, simply that it works. Next Example: How do you figure out what newspapers are liberal or conservative? Certain phrases are more used by liberals (estate tax, rosa parks, workers rights) and by conservatives (death tax, saddam hussein, government spending). By analysing the frequency with which certain words were used it’s possible to calculate the bias in the media. Why do some publications lean right and some lean left? The politics of a given area is instructive; the evidence strongly suggests that newspapers are inclined to give their readers what they want. Who owns the paper has much less effect than we might think upon its political bias. Many people have viewed American journalism as controlled by rich people or corporations with their goal of influencing the masses; but the owners of the American press give their readers what they want because they are primarily driven by profit. Are the media liberal or conservative? Newspapers slant left, but there is no grand conspiracy; it’s just the workings of good old capitalism.

- THE SECOND POWER: People lie to anonymous surveys; why? Because they lie to themselves, they want to make a good impression, etc. How can we learn what people are really thinking? Second power = certain online sources get people to admit things they would not admit anywhere else. Surveys tell us that there are far more gay men in tolerant states than in intolerant states; but is this the whole picture? We can measure the instances of searches for gay porn in tolerant and intolerant states and compare; the measure of pornography searches by men (5%) seems a reasonable estimate of the true size of the gay population in the US. Prejudice is another subject people are honest about with Google. Following the San Bernardino shooting, more than half of all searches about Muslims became hateful, whereas ‘only’ 20% were hateful before. Further, searches for the word ‘nigger’ shoot up whenever black people are in the news, when Obama got elected, and on Martin Luther King Jr. Day. Why? The dominant explanation is that, while blacks claim racism and whites deny this racism, there must be some implicit prejudice around. But hidden explicit racism is a more likely solution; after all, people don’t unconsciously search for ‘nigger jokes’ on google. Also, the internet isn’t as desegregated as many people believe it is; people with strong political opinions visit sites of the oppposite viewpoint all the time, purposefully or not. The internet actually brings people of different political views together, which is not what people ordinarily think. But there is a caveat; we should be wary of what people put on social networking sites such as Facebook. On Facebook, we show our cultivated selves, not our true selves. In Facebook world, the average adult seems to be happily married, vacationing in the Caribbean, and perusing the Atlantic. In the real world, a lot of people are angry, on supermarket checkout lines, peeking at the National Enquirer, ignoring the phone calls from their spouse, whom they haven’t slept with in years. If you’re a business you should never trust what your customers tell you; trust what they do. Netflix learned this lesson; if you ask users what films they want to watch, they fill the queue with aspirational films; but a few days later they just watch what they always want to watch; low-brow comedies or romance films. Netflix then stopped asking people what they wanted, and started recommending what films to watch based on what the data they had collected on the millions of clicks suggested people actually wanted to watch.

- THE THIRD POWER: Big Data allows us to meaningfully zoom in on small segments of a dataset to gain new insights on who we are. Raj Chetty got a hold of all Americans’ tax records since 1996, and it allowed him to answer some interesting questions. Is America a land of opportunity? If you take America as a whole, then the average American has a low chance of climbing to the top 20% of income earners if he starts in the bottom 20%, (a 7.5% chance). But this assumes that America is uniform; if we zoom in on the date, we can see that in some parts of the US, the chance of a poor kid succeeding is as high as in any developed country in the world. In answer to the question: in some parts, America is a land of opportunity, in other parts, it’s not. Which parts are good for poor kids? Areas that spend more on education, that have more religious people and lower crime, and that have less black people. What places are best at giving people a chance to escape the grim reaper? For the wealthiest Americans, it doesn’t matter where you live; but for the poorest, life expectancy varies depending on where you live. What factor affects this variance? How many rich people also live in that area. More rich people in a city means the poor there live longer.

- THE FOURTH POWER: randomised experiments are the gold standard for proving causality, and Big Data makes randomised experiments, which can find truly causal effects, much easier to conduct, anytime and anywhere as long as it’s online. Randomised controlled experiments are also called A/B testing; you test situation A (say a background for a website) and situation B (a different background) and compare and contrast which gets more clicks; whichever gets more clicks (for whatever the reason may be), go with that. Some experiements in the real world can’t be conducted and causes can’t be adduced (because they are unethical, etc) but we can use the natural world for A/B testing. A great high school known as Stuy is ranked number one in the US. Can we compare what people’s lives would have been like had they entered a school they failed to enter? To test the causal effects of Stuy high school, we need to compare two groups that are almost identical apart from one tiny variable. We can compare students who just barely didn’t manage to make it in Stuy and those who did manage. This category of natural experiments (using sharp numerical cutoffs) is called regression discontinuity. What did this study find? There was absolutely no difference between students who entered Stuy and those who just barely didn’t manage to get in. They ended up in equally prestigious universities. Stuy students achieve more in life than non-Stuy students because better students attend Stuy in the first place. Stuy doesn’t cause you to perform better. The factors that make you successful are your talent and your drive; not who gives your commencement speech.

- LIMITATIONS on BIG DATA: Can Big Data predict which ways the stocks are headed? No. If there are many variables, one variable is bound to correspond in some statistically significant manner to an individual outcome; but this doesn’t mean that there is a causal relationship between the two variables. It just means that that variable got lucky. Take IQ and genes; is there one gene that can add a whole bunch of IQ points? Robert Plomin thought he found the answer by comparing the DNA of geniuses to the DNA of those with average IQs. There was a striking difference between the two; geni had a gene that stupidoes didn’t have. Was this the gene for IQ? No. A few years later, Plomin got access to another sample of people that included their DNA and IQ, but this time there was no correlation. The curse of dimensionality had struck again. How do you mitigate against this curse? Humility and detachment from results. Further, even if we can measure some statistics, the things we can measure are often not exactly what we care about. we can measure how well students do on multiple-choice questions; but we can’t easily measure curiosity, this latter trait being much more important than how well someone does on a silly test. What is needed in this case is Big Data coupled with rational human judgement.

msktprsns's review

4.0

More than anything, I really enjoyed the writing in this. Like participating in a conversation with a friend. And since this was a fairly technical topic (or could be), I think it's a real feat to write so clearly and in such an approachable way. Maybe I was particularly grateful for this because I put down the previous book I was reading for being so impenetrable. The general lessons of the book were not particularly new to me, but nearly all of the specific examples and anecdotes were, and I appreciate now having them to hand. Worth the read.

More...

662 reviews for:

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are

Seth Stephens-Davidowitz

662 reviews for:

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are

Seth Stephens-Davidowitz

bisthesu's review

sofiaesquivelp's review

keerthanakanchi's review

camerawoman's review

bookmeister's review

starcrunch's review

pennyriley's review

rjsthumbelina's review

icywaterfall's review

msktprsns's review