OPTIMIZING AI PERFORMANCE: STRATEGIES FOR SCALABLE MODEL OBSERVABILITY

Sandeep Bharadwaj Mannapur

Authors

Sandeep Bharadwaj Mannapur Jawaharlal Nehru Technological University, Hyderabad, India Author

Keywords:

Model Observability, Machine Learning Monitoring, Automated Root Cause Analysis, Self-Healing Systems, Production AI Maintenance

Abstract

This article provides a comprehensive analysis of advanced model observability techniques, focusing on two leading platforms: Evidently AI and Arthur AI. The article examines the critical need for robust monitoring in production machine learning systems, where models face dynamic environments and evolving data distributions. Through detailed case studies across healthcare, financial services, and smart city implementations, the article demonstrates how these platforms enable organizations to maintain model performance, ensure regulatory compliance, and optimize resource utilization. The article explores implementation strategies, key features, and emerging trends in model observability, while also addressing critical challenges in scaling these solutions across distributed architectures. Special attention is given to automated root cause analysis, unified observability platforms, and self-healing capabilities powered by deep learning.

References

Kim Harrison, "Machine Learning Model Monitoring: What to do in production," Heavybit Library, 25 September 2024. [Online]. Available: https://www.heavybit.com/library/article/machine-learning-model-monitoring

Bradley J Eck et al., "A Monitoring Framework for Deployed Machine Learning Models with Supply Chain Examples," ResearchGate, November 2022. [Online]. Available: https://www.researchgate.net/publication/365359504_A_monitoring_framework_for_deployed_machine_learning_models_with_supply_chain_examples

Yogesh Dwiwedi et al., "Algorithmic bias in machine learning-based marketing models," Journal of Business Research, vol. 145, pp. 325-337, 8 February 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0148296322000959

Shreya Shankar & Aditya G Parameswaran, "Towards Observability for Production Machine Learning Pipelines," ETH Zurich, January 2023. [Online]. Available: https://www.researchgate.net/publication/367321369_Towards_Observability_for_Production_Machine_Learning_Pipelines

Michael Schulz, et al., "Challenges in Deploying Machine Learning: A Survey of Case Studies," ACM Computing Surveys, vol. 55, no. 5, pp. 1-37, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2666764922000303

Mariam Yusuf, "Attribute-Based Encryption for Fine-Grained Access Control," ResearchGate, November 2024. [Online]. Available: https://www.researchgate.net/publication/386381715_Attribute-Based_Encryption_for_Fine-Grained_Access_Control

Nitin Rane et al., "Artificial Intelligence, Machine Learning, and Deep Learning for Enabling Smart and Sustainable Cities and Infrastructure," ResearchGate, October 2024. [Online]. Available: https://www.researchgate.net/publication/385154120_Artificial_intelligence_machine_learning_and_deep_learning_for_enabling_smart_and_sustainable_cities_and_infrastructure

Barret Rush et al., "Applying Machine Learning to Continuously Monitored Physiological Data," Healthcare: The Journal of Delivery Science and Innovation, 2019. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC6511324/

Shubham Shubham & Anjali Dhamiwal, "Artificial Intelligence in Financial Services," ResearchGate, May 2024. [Online]. Available: https://www.researchgate.net/publication/380518966_Artificial_Intelligence_in_Financial_Services

Fuhad Ahmed et al, "Comprehensive review of high-dimensional monitoring methods: trends, insights, and interconnections," Journal of Computer Architecture, vol. 15, no. 1, 26 August 2024. [Online]. Available: https://www.tandfonline.com/doi/full/10.1080/16843703.2024.2395745?af=R#abstract

Samartha Shah & Akshun Chhapola et al., "Improving Observability in Microservices," ResearchGate, December 2024. [Online]. Available: https://www.researchgate.net/publication/387576173_Improving_Observability_in_Microservices

Amit Sengupta et al., "IT Observability Transformation," ResearchGate, May 2021. [Online]. Available: https://www.researchgate.net/publication/383942602_IT_Observability_Transformation

Paulius Rauba et al., "Self-Healing Machine Learning: A Framework for Autonomous Adaptation in Real-World Environments ," arXiv, 2024. [Online]. Available: https://arxiv.org/pdf/2411.00186

OPTIMIZING AI PERFORMANCE: STRATEGIES FOR SCALABLE MODEL OBSERVABILITY

Authors

Keywords:

Abstract

References

Published

Issue

Section

How to Cite