swagat_siddhartha's review

4.0

Interesting use cases throughout the books with some hard-hitting facts about the current mindset of the internet accessing generation, a good place to start if one needs to understand what big data means!

I liked it!

Why I picked it up

Mel was reading it for her company book club. I decided to come along for the ride!

What I learned

* A lot of trivia, from silly to uncomfortable, related to sexual preference, professional sports, and politics.

* A little bit about how big data works and doesn't work. About its applications and limitations, and some ethical concerns surrounding it.

What I didn't like

Skip the whole conclusion. It is silly and off topic.
ajhampton0524's profile picture

ajhampton0524's review

4.0

A natural successor of works like Freakonomics, a contemporary of Dataclysm, and a cousin to The Secret Life of Pronouns. The questions and techniques the book raises are fascinating and worthy of revisiting often. The writing and methodoligical rigor might be uneven across analyses, but it's never boring. Great for people without a background in social sciences to see what we're getting into these days.

shelves's review

3.0

should've just wrote an extended new yorker article or some shit.

i'm beginning to think my issue is just reading books by white dudes, whose fundamental perspective in life is narrow as fuck and differs greatly from mine (also see: my take on what could b a phenom sf novel but is sadly impeded by white guyism) and that i should probably stop reading books authored by people like these. but you never know, do you? you never know if you'll end up with a pratchett or if you find yourself mid-way through a book about big data and google searches and having to read the line "How many American men are gay? This is a legendary question in sexuality research".

seriously.

seriously.

i guess, to use this example, what i'd expected is a breakdown on why knowing american men are gay is important. cast allusions to the way of life, to the economy, to the bias towards heterosexual nuclear families as fundamental to the functioning of capitalism as we know it. anything but "haha 4.8% of men in texas search for gay porn!! way higher than rhode island!! but facebook surveys say only 2% watch gay porn!!". though subsequently he does address this via his thesis statement: that google searches are a more honest way of obtaining data. women who suspect their husbands are gay (and of course, this only applies to closeted gay/bi/pan/what have you men) do sometimes google "how do i know my husband likes dick?".

what i do enjoy about this book is him introducing inventive ways of obtaining data. what do you look at vs. what do you don't? how do you avoid from making correlation/causation assumptions? he introduces a number of pitfalls (coin 372) that i've never heard of when it comes to data analysis, and that was fascinating. but he lets his """humorous personality""" (his words, not mine) eclipse the data side of this and it sucks. for example:
Perhaps someone in the president’s office had read Soltas’s and my Times column, which discussed what had worked and what didn’t. For the content of this speech was noticeably different.

When people learn that I am a data scientist and a writer, they sometimes will share some fact or survey with me. I often find this data boring—static and lifeless. It has no story to tell. Likewise, friends have tried to get me to join them in reading novels and biographies. But these hold little interest for me as well. I always find myself asking, “Would that happen in other situations? What’s the more general principle?” Their stories feel small and unrepresentative.

conclusion? this dude needs to get on more qual studies.
funny informative fast-paced

thatlibrarynerd's review

4.0

An interesting book explaining big data through specific examples, with prose made engaging with personal anecdotes. It is not, and does not purport to be, the definitive text on big data; the author acknowledges personal bias. It is, however, a very accessible book on data for people without a strong background in math and science.
funny slow-paced
courtneydoss's profile picture

courtneydoss's review

4.0

I've seen this book condemned in the reviews below for failing to accurately express the difference between correlation and causation, for making logical leaps, and for presenting the topic of the book through disjointed, completely unrelated factoids. That is not my issue. I understand that there is a difference between correlation and causation, and definitely am aware that not everything mentioned in this book is quite as profoundly meaningful as the author might believe. I did not come to this book as a definitive resource on data analytics. I came to this book to be entertained, and entertained I was.

The premise of this book is a detailed look at the science of data analytics, revolutionized with the advent of the internet. It presents the thesis that viewing people through their internet browsers rather than traditional methods that rely on the truthfulness of the participant is a more effective form of study, and that by studying internet trends we are able to learn much more about the human condition than ever before. By reviewing the data presented through the internet, a place where users are more likely to reveal truths about themselves as compared to methods that involve divulging said truths to a third party, Stephens-Davidowitz is able to reach specific conclusions that serve to debunk previously held beliefs.

An example of this is the assumption by the general public that professional athletes, particularly those within the NBA, are more likely to come from low-income families. Stephens-Davidowitz challenges this assumption through analyzing three separate types of data to eventually conclude that it is upper middle class families, more than any other demographic, that raise successful basketball players. Another example is the correlation between organ size and performance in race horses. Challenging the previously held consideration of pedigree in a horse's success, Stephens-Davidowitz presents a study that proves pedigree to be virtually irrelevant to performance. The why of this correlation is irrelevant, states Stephens-Davidowitz. Sometimes it is sufficient to know that something is correlated without knowing the how or why.

In addition to presenting conclusions found through data analytics, Stephens-Davidowitz shows a large amount of information accumulated through analyzing Google data. Through this method, he shows that women are more giving sexual partners, that a massive amount of people regret being parents, and that there is a depressing, yet unsurprising correlation between areas where Trump is most popular and areas that most liberally search for the N-word.

This book was fascinating, and I spent a lot of time relaying the most interesting factoids to my family and friends while reading through it. While most of this information won't serve any real purpose in my own life, it provides an excellent basis for conversation, which is always nice right before heading into the holidays. This book wasn't perfect, but I recommend it highly to anyone that is as fascinated by the topic as I am.
reflective slow-paced

This book had the potential to be amazing, but for me it fell pretty flat. The idea was interesting, but the methodology felt sloppy and the writing, even though the author attempted to add some levity, ended up being really dry. It wasn't terrible, but it wasn't great.

Qué libro tan fantástico. Es un Freakonomics pero con estudios de Big Data. El autor se hace mil preguntas y las responde usando big data, principalmente mediante el análisis detallado de las búsquedas que la gente hace en Google (y en otros sitios, como PornHub).
El origen del libro es fantástico:
[Nathan] Silver found that the single factor that best correlated with Donald Trump’s support in the Republican primaries was that measure I had discovered four years earlier. Areas that supported Trump in the largest numbers were those that made the most Google searches for “nigger.”


Lo cual le lleva al autor a hacer una declaración de intenciones:

I am now convinced that Google searches are the most important dataset ever collected on the human psyche. [...] In fact, at the risk of sounding grandiose, I have come to believe that the new data increasingly available in our digital age will radically expand our understanding of humankind.


El autor nos lleva de viaje por un montón de temas interesantes. Uno de los que más me ha gustado es la vieja, viejísima regunta: ¿Qué porcentaje de la población es gay? Un viejo estudio de los años 60-70 decía que una de cada 10 personas es gay. Pero el autor lo reduce a 1 de cada 20 y explica tan bien como ha llegado a esa conclusión que me ha convencido del todo, es una cifra que parecía imposible de saber con precisión y el tío va y lo hace.

Hay muchos datos, como nuestros likes y preferencias en Facebook, que también sirven para cosas, pero no para todo. NUestras búsquedas en Google son siempre sinceras. Nuestros likes de Facebook no. De dos revistas con la misma tirada, una de cotilleos y otra de literatura, en FB la de literatura tenía más del doble de likes que la de cotilleos. Lo mismo pasa con las encuestas:
Many people underreport embarrassing behaviors and thoughts on surveys. They want to look good, even though most surveys are anonymous. This is called social desirability bias.


Más datos interesantes:
After making their decision—either to reproduce (or adopt) or not—people sometimes confess to Google that they rue their choice. This may come as something of a shock but post-decision, the numbers are reversed. Adults with children are 3.6 times more likely to tell Google they regret their decision than are adults without children.

About 28 percent of girls are overweight, while 35 percent of boys are. Even though scales measure more overweight boys than girls, parents see—or worry about—overweight girls much more frequently than overweight boys.

On weekends with a popular violent movie, the economists found, crime dropped.

Students who were taught fractions via a game tested worse than those who learned fractions in a more standard way.


El autor usa el big data para la autoayuda, al hablar de que no debenmos fiarnos de todo lo que la gente pone en Instagram:
In fact, I think Big Data can give a twenty-first-century update to a famous self-help quote: “Never compare your insides to everyone else’s outsides.” A Big Data update may be: “Never compare your Google searches to everyone else’s social media posts.”


También suelta perlas de humor:
February 27, 2000, started as an ordinary day on Google’s Mountain View campus. The sun was shining, the bikers were pedaling, the masseuses were massaging, the employees were hydrating with cucumber water.


Y muestra signos de profundidad filosófica:
Milan Kundera, the Czech-born writer, has a pithy quote about this in his novel The Unbearable Lightness of Being: “Human life occurs only once, and the reason we cannot determine which of our decisions are good and which bad is that in a given situation we can make only one decision; we are not granted a second, third or fourth life in which to compare various decisions.”



Hay muchas muchas cosas más. ¿Qué diferencias en la vida podemos esperar entre el último admitido y el primer no admitido a una escuela de prestigio? ¿Qué palabras en una petición de un crédito son claro indicador de que la persona es menos proclive a devolverlo? ¿Es lícito usar este conocimiento para denegar créditos?

El libro sigue y sigue. Tiene un montón de notas al pie, puestas todas al final (casi un tercio del libro [!!]) y acaba con un alegato a favor del big data:

The days of academics devoting months to recruiting a small number of undergrads to perform a single test will come to an end. Instead, academics will utilize digital data to test a few hundred or a few thousand ideas in just a few seconds. We’ll be able to learn a lot more in a lot less time. [...] How do ideas spread? How do new words form? How do words disappear? How do jokes form?


Interesantísimo. Divertido. Instructivo. Fantástico. Imprescindible.