In this podcast episode with Adrian Hornsby, we explored the critical topic of system resilience and its evolution. Adrian shared his journey into resilience engineering, which began after accidentally deleting a production database 15 years ago. This incident led him to dive deep into understanding both the technical and human aspects of building resilient systems, eventually leading to his role at AWS and his current focus on resilience consulting.
The discussion highlighted how resilience is often misunderstood as purely a technical challenge when it actually depends on three key elements: culture, tools, and processes. Adrian emphasized that organizations frequently make the mistake of trying to solve resilience problems by only implementing new tools, without addressing the cultural and procedural aspects. We also explored the parallel between security and resilience, noting how both suffer from the “prevention paradox” – where successful prevention makes it hard to justify continued investment.
Key Points and Podcast
00:00 Introduction
00:40 🎯 Adrian’s journey into resilience engineering started with a painful incident – accidentally deleting a production database, which led him to explore how to prevent such issues proactively rather than reactively.
03:31 💡 Resilience engineering depends on three critical pillars: culture, tools, and processes. Having just one or two isn’t enough – they must work together harmoniously for true resilience.
10:41 🔄 There’s a delicate balance between making systems secure/resilient and keeping them flexible enough for emergency interventions. Too much rigidity can be as problematic as too little security.
27:42 🤖 With AI systems becoming more prevalent, there’s a growing challenge of “black box complexity” where systems make decisions autonomously, making it harder to understand and fix issues when they arise.
31:31 🛡️ Resilience faces the “prevention paradox” – when prevention works perfectly, nothing bad happens, making it hard to justify the investment because success means nothing notable occurred (similar to Y2K preparations).
36:40 🔗 Resilience and security are “brother and sister” disciplines that face similar challenges – both are continuous processes (not one-time projects) and both struggle with proving their value when they’re working well.

Leave a Reply