How to anticipate, monitor, respond, and continuously learn from failure.

I want to express my gratitude to my colleagues and friends Ricardo Sueiras, Isabel Huerga Ayza, Matt Fitzgerald, Antonio Valle, Aaron Schwam, and Won Huh for their valuable feedback.


Thoughts on chaos engineering and AWS Fault Injection Simulator

In this blog post, I want to look back a little bit because, as Rachel Carson said:

“To understand the living present and the promise of the future, it is necessary to remember the past.” — Rachel Carson

For those who don’t know, Rachel Louise Carson (May 27, 1907 —…


Curated list of my favorite AWS updates from re:Invent 2020

https://reinvent.awsevents.com/

reInvent 2020 is coming to an end. A lot of new launches have happened since I published Part 1 of this series. Because digesting all the different updates takes time and a lot of coffee, I thought I’d help you out a little.

Following is a curated list of things…


Curated list of my favorite AWS updates from re:Invent 2020

https://reinvent.awsevents.com/

While reInvent just started, the first keynote from Andy Jassy has had a lot of new launches. I know that digesting all the updates takes time and a lot of coffee, so let me help you.

Following is a curated list of things that I found most important; matters related…


Towards Operational Excellence

I want to express my gratitude to my colleagues and friends Ricardo Sueiras, Matt Fitzerald, and Boaz Ziniman for their valuable feedback.


A list of my operational excellence related blog posts.

Series on Operational Excellence

It takes three interconnecting elements to operate the technology we build successfully. First, you need to have the right culture. Second, you need great tools. And third, you need complete processes.

Part 1 of the series covers the cultural side of…


Originally published at https://aws.amazon.com on August 18, 2020 by Varun Jewalikar and Adrian Hornsby

Large-scale distributed software systems are composed of several individual sub-systems-such as CDNs, load balancers, and databases-and their interactions. These interactions sometimes have unpredictable outcomes caused by unforeseen turbulent events (for example, a network failure). These events can lead to system-wide failures.

Chaos engineering is the discipline of experimenting on a…


Recording and redacted transcript from my keynote at the AWS Community Day Australia and New Zealand.

First of all, I would like to thank everyone in the AWS Community in Australia and New Zealand and the AWS Heroes who have helped put this event together.

In particular, Augustino (aka Gus), Peter, and Nathan , the AWS Heroes, and John our OBS Ninja. …


Towards Operational Excellence

I’d like to express my gratitude to my colleague and friend Arni Birgisson for his valuable feedback.

Since I published my blog series Towards Operational Excellence, I received a relatively large amount of feedback. But one question, in particular, stood out.

“Can you share an incident postmortem template?”

In this…

Adrian Hornsby

Principal, EC2 Core @awscloud ☁️ I break stuff .. mostly. Opinions here are my own.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store