About Me

My photo
Rohit is an investor, startup advisor and an Application Modernization Scale Specialist working at Google.

Sunday, July 21, 2019

Tools To Create Chaos

These are interesting tools that I have come across in the last couple of days to create chaos one of the key SRE practices to determine if your production site can handle excess load ...

  • Gremlin: Chaos As a Service.  https://www.gremlin.com/docs/application-layer/attacks/ Resiliency through orchestrated chaos. Worth paying for this service if you have low confidence on the production readiness of your code or if you don't have SRE practices to shock the organization into operational readiness.
Now that you have succeeded in creating chaos how should you instrument and fix the system to deal with the chaos. To understand how to deal with chaos start with Health Checks and Graceful Degradation in Distributed Systems and  Testing in Production- The Safe Way

Other Book Chapters to understand the theory and implementation of SRE practices when dealing with Chaos read the chapters on Handling Overload and Addressing Cascading Failures from the SRE Books. As a bonus read the chapter on Non Abstract Large System Design to understand the design process for designing large scale fault tolerant systems.

Lastly if you are in the Bay Area this looks like  an awesome conference https://chaosconf.io/

Happy SRE Practices! 

No comments:

Post a Comment