Table of Contents
Fetching ...

Scalability and Maintainability Challenges and Solutions in Machine Learning: Systematic Literature Review

Karthik Shivashankar, Ghadi S. Al Hajj, Antonio Martini

TL;DR

This systematic literature review tackles the dual imperatives of scalability and maintainability in ML systems, identifying 41 maintainability and 13 scalability challenges across data and model engineering and the broader ML ecosystem. It formalises a set of concrete solutions, maps interdependencies among lifecycle stages, and analyzes tradeoffs to guide practitioners toward balanced, robust ML deployments. The study aggregates insights on data governance, drift, HPO, MLOps, and deployment governance, illustrating how improvements in one area affect others. Overall, the work provides a comprehensive, actionable framework for designing scalable, maintainable ML systems and highlights avenues for future empirical validation and tooling enhancements.

Abstract

This systematic literature review examines the critical challenges and solutions related to scalability and maintainability in Machine Learning (ML) systems. As ML applications become increasingly complex and widespread across industries, the need to balance system scalability with long-term maintainability has emerged as a significant concern. This review synthesizes current research and practices addressing these dual challenges across the entire ML life-cycle, from data engineering to model deployment in production. We analyzed 124 papers to identify and categorize 41 maintainability challenges and 13 scalability challenges, along with their corresponding solutions. Our findings reveal intricate inter dependencies between scalability and maintainability, where improvements in one often impact the other. The review is structured around six primary research questions, examining maintainability and scalability challenges in data engineering, model engineering, and ML system development. We explore how these challenges manifest differently across various stages of the ML life-cycle. This comprehensive overview offers valuable insights for both researchers and practitioners in the field of ML systems. It aims to guide future research directions, inform best practices, and contribute to the development of more robust, efficient, and sustainable ML applications across various domains.

Scalability and Maintainability Challenges and Solutions in Machine Learning: Systematic Literature Review

TL;DR

This systematic literature review tackles the dual imperatives of scalability and maintainability in ML systems, identifying 41 maintainability and 13 scalability challenges across data and model engineering and the broader ML ecosystem. It formalises a set of concrete solutions, maps interdependencies among lifecycle stages, and analyzes tradeoffs to guide practitioners toward balanced, robust ML deployments. The study aggregates insights on data governance, drift, HPO, MLOps, and deployment governance, illustrating how improvements in one area affect others. Overall, the work provides a comprehensive, actionable framework for designing scalable, maintainable ML systems and highlights avenues for future empirical validation and tooling enhancements.

Abstract

This systematic literature review examines the critical challenges and solutions related to scalability and maintainability in Machine Learning (ML) systems. As ML applications become increasingly complex and widespread across industries, the need to balance system scalability with long-term maintainability has emerged as a significant concern. This review synthesizes current research and practices addressing these dual challenges across the entire ML life-cycle, from data engineering to model deployment in production. We analyzed 124 papers to identify and categorize 41 maintainability challenges and 13 scalability challenges, along with their corresponding solutions. Our findings reveal intricate inter dependencies between scalability and maintainability, where improvements in one often impact the other. The review is structured around six primary research questions, examining maintainability and scalability challenges in data engineering, model engineering, and ML system development. We explore how these challenges manifest differently across various stages of the ML life-cycle. This comprehensive overview offers valuable insights for both researchers and practitioners in the field of ML systems. It aims to guide future research directions, inform best practices, and contribute to the development of more robust, efficient, and sustainable ML applications across various domains.

Paper Structure

This paper contains 110 sections, 6 figures, 17 tables.

Figures (6)

  • Figure 1: Systematic Literature Review Process
  • Figure 2: Number of Papers by Years
  • Figure 3: Distribution of Publication Type
  • Figure 4: Distribution of Authors Affiliation
  • Figure 5: Authors Distribution by Region
  • ...and 1 more figures