CHAOS ENGINEERING: STRESS-TESTING MICROSERVICES FOR RESILIENCE

Authors

  • Venkata Durga Ganesh Nandigam Engineering Manager, Nordstrom, USA Author

Keywords:

Chaos Engineering, Microservices Resilience, System Reliability Testing, Failure Injection, Service Dependencies

Abstract

In the realm of distributed systems, microservices architectures have become the cornerstone for building scalable and flexible applications. However, with the increased complexity and interdependence of services, ensuring system resilience against failures has become a significant challenge. Chaos Engineering emerges as a proactive approach to enhance the robustness of microservices by intentionally introducing failures and observing system behavior under stress. This article explores the principles and practices of Chaos Engineering as a method for stress-testing microservices to achieve resilience. It examines the inherent complexities of microservices architectures, including service dependencies, network uncertainties, and emergent behaviors that traditional testing methods may not uncover. Through real-world case studies and implementation strategies, it demonstrates how organizations have successfully adopted Chaos Engineering to identify vulnerabilities, improve system reliability, and build resilient architectures. The article also discusses emerging trends, including AI-driven experimentation, security chaos engineering, and applications in edge computing environments, highlighting the evolution and future directions of this critical discipline.

References

Hassaan Siddiqui, Ferhat Khendek, Maria Toeroe, "Microservices based architectures for IoT systems - State-of-the-art review," Internet of Things, Volume 23, October 2023, 100854. [Online]. Available: https://www.sciencedirect.com/science/article/abs/pii/S2542660523001774

Leonardo Leite et al., "A Survey of DevOps Concepts and Challenges," ACM Computing Surveys (CSUR), Volume 52, Issue 6, Article No.: 127, Pages 1 - 35, 14 November 2019. [Online]. Available: https://dl.acm.org/doi/10.1145/3359981

Sara Hassan; Rami Bahsoon, "Microservices and Their Design Trade-Offs: A Self-Adaptive Roadmap," 2016 IEEE International Conference on Services Computing (SCC), 01 September 2016. [Online]. Available: https://ieeexplore.ieee.org/document/7557535

Sara M. Hezavehi et al., "Uncertainty in Self-adaptive Systems: A Research Community Perspective," ACM Transactions on Autonomous and Adaptive Systems (TAAS), Volume 15, Issue 4, Article No.: 10, Pages 1 - 36, 20 December 2021. [Online]. Available: https://dl.acm.org/doi/10.1145/3487921

Ali Basiri et al., "Chaos Engineering," IEEE Software, Volume 33, Issue 3, Pages 35 - 41, 01 May 2016. [Online]. Available: https://dl.acm.org/doi/10.1109/MS.2016.60

Infosys, "Achieving Order through Chaos Engineering: A Smarter Way to Build System Resilience." [Online]. Available: https://www.infosys.com/services/it-services/validation-solution/documents/smarter-way-build-system-resilience.pdf

Sandeep Kaushik, "Chaos Engineering: An Approach to Resilience in the System," Medium, Jan 26, 2024. [Online]. Available: https://medium.com/@shyamsandeep28/chaos-engineering-an-approach-to-resilience-in-the-system-826aeda5255d

Navdeep Singh Gill, "Chaos Engineering For Cloud Native - A Definitive Guide," XenonStack, 14 August 2024. [Online]. Available: https://www.xenonstack.com/blog/chaos-engineering-for-cloud-native

Parul Prajapati, "The Evolution of Chaos Engineering," Cavisson, Apr 24, 2024. [Online]. Available: https://www.cavisson.com/the-evolution-of-chaos-engineering/#:~:text=Chaos%20engineering%20involves%20intentionally%20causing,impact%20and%20improve%20resiliency%20strategies

AccelQ, "What Is Chaos Engineering? Principles, Best Practices, Advantages," 12 October 2023. [Online]. Available: https://www.accelq.com/blog/chaos-engineering/

Gremlin, "Chaos Engineering: the history, principles, and practice," October 12, 2023. [Online]. Available: https://www.gremlin.com/community/tutorials/chaos-engineering-the-history-principles-and-practice

Qentelli, "How Relevant is Chaos Engineering Today?" [Online]. Available: https://qentelli.com/thought-leadership/insights/how-relevant-is-chaos-engineering-today

Amro Al-Said Ahmad, Lamis F. Al-Qora’n & Ahmad Zayed, "Exploring the impact of chaos engineering with various user loads on cloud native applications: an exploratory empirical study," Computing, Volume 106, pages 2389–2425, (2024), 05 May 2024. [Online]. Available: https://link.springer.com/article/10.1007/s00607-024-01292-z

Bastien Leblanc, "Increase your e-commerce website reliability using chaos engineering and AWS Fault Injection Simulator," AWS DevOps & Developer Productivity Blog, 16 June 2021. [Online]. Available: https://aws.amazon.com/blogs/devops/increase-e-commerce-reliability-using-chaos-engineering-with-aws-fault-injection-simulator/

Tatyana Woodall, "Predicting Chaos With AI: The New Frontier in Autonomous Control," SciTechDaily, May 18, 2024. [Online]. Available: https://scitechdaily.com/predicting-chaos-with-ai-the-new-frontier-in-autonomous-control/

Ben Lutkevich, "What is chaos engineering?," TechTarget. [Online]. Available: https://www.techtarget.com/searchitoperations/definition/chaos-engineering

Published

2024-12-05

How to Cite

Venkata Durga Ganesh Nandigam. (2024). CHAOS ENGINEERING: STRESS-TESTING MICROSERVICES FOR RESILIENCE. INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND INFORMATION TECHNOLOGY (IJRCAIT), 7(2), 1993-2004. https://ijrcait.com/index.php/home/article/view/IJRCAIT_07_02_140