Thoughts on chaos engineering and AWS Fault Injection Simulator

Image for post
Image for post

In this blog post, I want to look back a little bit because, as Rachel Carson said:

“To understand the living present and the promise of the future, it is necessary to remember the past.” — Rachel Carson

I love this quote and the work Rachel has done for our world — and today, I want to apply her wisdom to chaos…


Curated list of my favorite AWS updates from re:Invent 2020

Image for post
Image for post
https://reinvent.awsevents.com/

reInvent 2020 is coming to an end. A lot of new launches have happened since I published Part 1 of this series. Because digesting all the different updates takes time and a lot of coffee, I thought I’d help you out a little.

Following is a curated list of things that I found most important; matters related to architecture, scalability, reliability, performance, resiliency, devops, and security — anything that caught my eye, and I hope will satisfy yours.

AWS Fault Injection Simulator (coming in 2021)

AWS Fault Injection Simulator is a fully managed chaos engineering service that makes it easier for teams to discover an application’s weaknesses…


Curated list of my favorite AWS updates from re:Invent 2020

Image for post
Image for post
https://reinvent.awsevents.com/

While reInvent just started, the first keynote from Andy Jassy has had a lot of new launches. I know that digesting all the updates takes time and a lot of coffee, so let me help you.

Following is a curated list of things that I found most important; matters related to architecture, scalability, reliability, performance, resiliency, DevOps, and security — anything that caught my eye, and I hope will satisfy yours.

Amazon S3 now delivers strong read-after-write consistency automatically for all applications

This is hands-down my favorite launch!

Amazon S3 now delivers strong read-after-write consistency automatically for all applications for any storage request, without changes to performance or availability, without sacrificing…


Towards Operational Excellence

Image for post
Image for post

I want to express my gratitude to my colleagues and friends Ricardo Sueiras, Matt Fitzerald, and Boaz Ziniman for their valuable feedback.


A list of my operational excellence related blog posts.

Image for post
Image for post

Series on Operational Excellence

It takes three interconnecting elements to operate the technology we build successfully. First, you need to have the right culture. Second, you need great tools. And third, you need complete processes.

Part 1 of the series covers the cultural side of Operational Excellence (OE) and examined Amazon’s culture in the context of its Leadership Principles (LPs). Part 2 discusses the role that tools play in achieving OE. Part 3 covers the final aspect to operational excellence — processes — or what we call mechanisms.

Below is the AWS Summit 2020 recording…


Originally published at https://aws.amazon.com on August 18, 2020 by Varun Jewalikar and Adrian Hornsby

Image for post
Image for post

Large-scale distributed software systems are composed of several individual sub-systems-such as CDNs, load balancers, and databases-and their interactions. These interactions sometimes have unpredictable outcomes caused by unforeseen turbulent events (for example, a network failure). These events can lead to system-wide failures.

Chaos engineering is the discipline of experimenting on a distributed system to build confidence in the system’s capability to withstand turbulent events. Chaos engineering requires adopting practices to identify interactions in distributed systems and related failures proactively, and also needs implementing and validating countermeasures. The key to chaos engineering is injecting failure in a controlled manner.

In this post…


Recording and redacted transcript from my keynote at the AWS Community Day Australia and New Zealand.

Image for post
Image for post

First of all, I would like to thank everyone in the AWS Community in Australia and New Zealand and the AWS Heroes who have helped put this event together.

In particular, Augustino (aka Gus), Peter, and Nathan , the AWS Heroes, and John our OBS Ninja. All have done an amazing work building this Community.

Thank you!

A little reminder that if you plan to share your day with others on social media, please use the hashtag:

#AWSCommunityDayANZ

A few years ago, I did a talk called ten lessons from ten years on AWS.

That was at the Community…


Towards Operational Excellence

Image for post
Image for post

I’d like to express my gratitude to my colleague and friend Arni Birgisson for his valuable feedback.

Since I published my blog series Towards Operational Excellence, I received a relatively large amount of feedback. But one question, in particular, stood out.

“Can you share an incident postmortem template?”

What is a postmortem?

A postmortem is a process where…


Chaos Engineering on AWS

Image for post
Image for post

I’d like to express my gratitude to my colleagues and friends Jason Byrne and Matt Fitzgerald for their valuable feedback.


Chaos Engineering with Canary deployments

Image for post
Image for post

I recently did a two-hour webinar dedicated to chaos engineering and got a lot of great questions from the audience. In this mini-series of posts, I take some time to answer them.

If you missed the webinar, you could access it on-demand from the link below. And if you have questions you would like me to address, feel free to ask me directly on Twitter :-)

One of the most commonly asked question with regards to Chaos Engineering is:

“How to safely inject failure in your application?”

Adrian Hornsby

Principal Technical Evangelist, Architecture @awscloud ☁️ I break stuff .. mostly. Opinions here are my own.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store