challenging informative slow-paced

There are some good ideas here as well as some fairly esoteric ones. For me, much of the value comes from the appendices: the checklists and templates.

Site Reliability Engineering provides a very rare opportunity to get a broad and deep look into how a major Internet company runs their technical operations at immense scale. While very few companies need to build systems at Google-scale, the book still contains an enormous amount of useful information for the rest of us on how to run technical operations. Many of the processes and policies Google employs can be effectively scaled down.

The book isn't just for SREs, either. Developers who want to understand what their ops colleagues do and, more importantly, how they can be effective and supportive partners for them, should read this book. I've gathered up a significant amount of this information in my long career, especially over the last 15 years, but reading this book will obviously be a much, much faster way to learn a lot of this. However, that learning should be combined with experiencing this work in the trenches. The lessons learned in the heat of battle about the need for observability, monitoring, and recoverability will stick with you a lot longer than just reading about them.

It's hard to give this book a star rating - there were some essays that I really enjoyed and learned from, while others I found dry. Perhaps my only recommendation would regarding this book would be to not read it cover to cover, and instead, hone in on the parts of Site Reliability Engineering, and read just those essays.

Somewhat tangential, and likely due to recency bias, but one of the essays I liked the most was one of the last ones, in which the authors interviewed a host of Google software engineers, who had backgrounds in industries that also cared heavily about reliability - think air traffic control, working on the 911 system, nuclear power plant engineers, and lifeguards. The essay compared and contrasted those industries' definitions and standards for reliability with the Google SRE teams, and while there were quite a lot of similarities, there were also notable differences which I found interesting.

What I liked the most is that this book pretty much cover all around aspects of SRE. The examples presented lack details. Also the fact that it's a collection of essays means the quality can vary chapter to chapter. But overall a must-read book for anyone in the industry. If only more of the big players would share their experience in such a nice way.

“Perfect algorithms may not have perfect implementations.”

And perfect books may not have perfect writers. Site Reliability Engineering is an essay collection that can be rickety at times but is steadfast in its central thesis. Google can claim credit for inventing Site Reliability Engineering and, in this book, a bunch of noteworthy engineers share their wisdom from the trenches.

When it comes to software architecture and product development, I’ve found delight in reading about how startups’ products are built because the stories are digestible. It’s possible for a founder, lead engineer, or technical writer to lay down the blueprint of a small-scale product and even get into the nuts and bolts. When it comes to large tech companies, this is impossible from a technical point of view and improbable from a compliance standpoint.

This is beside the purpose of the book, but arrangements like this one help bridge the gap between one’s imagination and the inner-workings of tech giants. There are plenty of (good!) books that tell you all about how Google the business works, but this one happens to be the best insight into how the engineering side operates. Sure, you have to connect some dots and bring with you some experience, but the result is priceless--you start to feel like you get it.

The essays are almost all useful. If you haven’t spent at least an internship’s worth of time in the workforce, you should probably table this one until you have a bit more experience. I would have enjoyed this book as an undergraduate, no doubt, but most of it wouldn’t have clicked. The Practices section--really, the meat of the book--is where the uninitiated might struggle. When I emerged on the other side I had a list of at least twenty topics that I needed to explore in more detail if I was to become truly great at what I do.

I highly recommend this book to anyone on the SRE/DevOps spectrum as well as those trying to understand large-scale tech companies as a whole.

See this review and others on my blog

Is Google's reality as beautiful as described in this book? If so, it is heaven for people who like a structured way of working, and this describes the wet dream of structured sysadmins, not the ones who love to be honored for their firefighting capabilities but the ones notes when gone ... A must read for every sysadmin, even the firefighters, then they read how it should be done ;-)
informative slow-paced

Definitely the most useful and actionable software books I’ve read this year, wrt my job personally.

  • Some parts were a little too in depth for google specifically (near the beginning)
  • Some parts were interesting, but felt out of place and hard to grasp from the presented material (PAXOS)
  • could be 20% shorter
  • I don’t know if it’s all still up to date almost 10 years later
crochetspoon's profile picture

crochetspoon's review against another edition

DID NOT FINISH: 67%

Not what I'm looking for right now

In many places relatively lenghty but in general a great material of SRE foundamentals, useful for any senior role.