With the rise of cloud-based (microservices) architectures, applications are becoming increasingly more distributed and complex. Your business and its consumers depend heavily on these systems, yet failures have become much harder to predict and time to do a root cause analysis increases.
A service failure might lead to a costly outage and customers could go and shop elsewhere, so creating reliable software is a fundamental necessity for modern cloud applications and architectures. Chaos and reliability engineering techniques can help with preparing for these kinds of unknown failures and will give you a better perspective on how resilient your service or application is.
In this presentation, we’ll take a look at the principles of chaos/resilience engineering and see some practical techniques and tools that you can apply within your organisation/project to improve the resilience of your systems.