Member-only story

Patterns for Resilient Architecture — Part 1

The story of embracing failure at scale

Adrian Hornsby
10 min readJul 25, 2018

Part 1 — Embracing Failure at Scale

Part 2 — Avoiding Cascading Failures

Part 3 — Preventing Service Failures with Health Check

Part 4 — Caching for Resiliency

As you may know, a quote that shaped the way I think about architecture is from Werner Vogels, CTO at Amazon.com. He said:

“Failures are a given, and everything will eventually fail over time.”

Having worked on large-scale systems for more than a decade, if I could summarize in a single animation what I think about managing systems at scale and failure, it would be something like this. (this is a real video and the base jumper survived that failure)

Resiliency, or the art of being at the edge

But why? Good question Vincent!

The art of managing systems at scale lies in embracing failure and being at the edge — pushing the limits of your system and software performance ‘almost to breaking point, yet still being able to recover. From the outside it looks both impressive and…

--

--

Adrian Hornsby
Adrian Hornsby

Written by Adrian Hornsby

Former Principal Engineer @ AWS ☁️ I break stuff .. mostly.

Responses (9)