Static Stability: When ‘Doing Nothing’ Is Your Best Incident Response

Resilience Bites

2 min readApr 27, 2025

“Doing nothing” is often the most productive response to an incident.

This counterintuitive approach has to do with something called static stability, a very important principle to understand when building resilient software systems.

Static stability originates from classical mechanics and engineering physics. It describes a system’s ability to return to its equilibrium state after a small displacement when no external forces are applied.

The concept became particularly influential in the aviation industry, where aircraft are typically designed to be statically stable so they can return to their original position even when disturbed by turbulence, without requiring pilot intervention.

Though an oversimplification, consider an airplane equipped with four engines. It can maintain its flight, almost as if nothing were wrong, even when one or two of its engines are temporarily shut down. No immediate action is required; the aircraft can rely on the remaining engines to sustain its course.

The remaining engines have enough power to compensate for the loss of the other two engines. In this configuration, the airplane demonstrates static stability.

This idea is similarly applicable to software systems. A software system is statically stable if it maintains its original steady-state (behavior) even when subjected to disruptive events without having to do any work.

A simple example of static stability can be found in AWS Lambda. Lambda achieves static stability by pre-provisioning extra capacity across multiple Availability Zones.

By doing that, it ensures that if one zone becomes impaired, the remaining zones can automatically absorb the traffic without requiring immediate control plane operations. No need to start instances. No need to scale. It just works.

By designing statically stable services, Lambda can maintain its steady-state and 99.95% monthly availability commitment.

Of course, designing statically stable services means running over-provisioned resources, but that is the price for resilience. It is always in tension with efficiency.

Many AWS services are designed with this principle in mind, prioritizing resilience through static stability rather than complex recovery mechanisms.

You can read about this principle applied to AWS Lambda in the article AWS Lambda Resilience Under the Hood that I wrote with my former colleague Marcia Villalba.

— —

If you struggle to improve resilience in your organization, contact me. I’ve spent the better part of a decade helping organizations transform their approach to system resilience and chaos engineering.

--

--

Adrian Hornsby
Adrian Hornsby

Written by Adrian Hornsby

I help software organizations improve resilience and achieve operational excellence | Former Principal Engineer at AWS | Follow for posts on resilience

No responses yet