Member-only story
Monitoring Chaos Engineering Experiments with Datadog
Tutorial — Using Datadog Alarms with Amazon EventBridge to Safely Monitor AWS Fault Injection Simulator Experiments
Many businesses and organizations adopt cloud computing as a key aspect of their technology strategy. While most applications can be migrated easily, some need modernization before they can be moved to the cloud. There are also applications that must remain on-premises due to extreme low-latency requirements, and data sovereignty or residency requirements. This leads many organizations to adopt hybrid- or multi-cloud architectures. These multi- and hybrid-cloud setups make it harder for organizations to have visibility across the complete landscape.
As workloads move to these hybrid- and multi-cloud setups, performance monitoring can become difficult, preventing engineers from having the necessary data and visibility needed to perform their jobs. Often, they have data from individual cloud providers but dont have a consistent view across their services. This makes it challenging to understand their system’s characteristics and what might be causing issues.
In addition, this added infrastructure complexity pushes many of these organizations to adopt chaos engineering in order to identify weaknesses in their complex…