The Chaos Engineering Collection

A list of my chaos engineering related blog posts and open-source projects.

Image for post
Image for post

Series on chaos engineering

This is a collection of three articles on chaos engineering that present and discuss the different phases of the chaos engineering process.

Part 1: The art of breaking things purposefully

In Part 1 of this series, I introduce chaos engineering and explain how it helps uncover and fix unknowns in your system before they become outages in production; and also how it fosters positive cultural change inside organizations.

Part 2: Planning your first experiment

In Part 2, I discuss areas to invest in to start designing your first chaos engineering experiments and pick up the right hypothesis.

Part 3: Failure Injection — Tools and Methods

In Part 3, I focus on the experiment itself and present a collection of tools and methods that cover the broad spectrum of failure injection necessary for running chaos engineering experiments.

Practical Chaos Engineering

A set of articles presenting practical implementations of chaos engineering experiments.

Building resilient services at Prime Video with chaos engineering

In this article, we present an open source library, AWSSSMChaosRunner based on AWS System Manager. We also provide an example of integration and explain how Prime Video used this library to prevent potentially customer-impacting outages.

Creating your own Chaos Monkey with AWS Systems Manager Automation

In this article, I show how to create your own Chaos Monkey (randomly killing EC2 instances) and opened source ready-made failure injection to get started. Give it a try — you won’t regret it!

Written by

Principal Developer Advocate, Architecture @awscloud ☁️ I break stuff .. mostly. Opinions here are my own.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store