Quality Issues in Machine Learning Software Systems
Pierre-Olivier Côté, Amin Nikanjam, Rached Bouchoucha, Ilan Basta, Mouna Abidi, Foutse Khomh
TL;DR
The paper addresses the quality assurance of MLSSs by conducting an empirical, practitioner-centered study. Through 42 interviews (with 36 analyzed) and a validation questionnaire, it identifies 18 recurring quality issues across six dimensions (evaluability, explainability, debuggability, efficiency, maintainability, reliability) and catalogs 21 mitigation strategies, plus data-quality challenges during model evolution. It reveals that evaluability, debuggability, and reliability are the most prevalent concerns, while data completeness and accuracy are the primary data-quality problems, with significant drift-related challenges during evolution. The authors provide practical recommendations for data quality, evaluability, explainability, debuggability, efficiency, and reliability, and offer a replication package to support tool development and replication, aiming to advance robust QA tooling for ML-enabled software systems.
Abstract
Context: An increasing demand is observed in various domains to employ Machine Learning (ML) for solving complex problems. ML models are implemented as software components and deployed in Machine Learning Software Systems (MLSSs). Problem: There is a strong need for ensuring the serving quality of MLSSs. False or poor decisions of such systems can lead to malfunction of other systems, significant financial losses, or even threats to human life. The quality assurance of MLSSs is considered a challenging task and currently is a hot research topic. Objective: This paper aims to investigate the characteristics of real quality issues in MLSSs from the viewpoint of practitioners. This empirical study aims to identify a catalog of quality issues in MLSSs. Method: We conduct a set of interviews with practitioners/experts, to gather insights about their experience and practices when dealing with quality issues. We validate the identified quality issues via a survey with ML practitioners. Results: Based on the content of 37 interviews, we identified 18 recurring quality issues and 21 strategies to mitigate them. For each identified issue, we describe the causes and consequences according to the practitioners' experience. Conclusion: We believe the catalog of issues developed in this study will allow the community to develop efficient quality assurance tools for ML models and MLSSs. A replication package of our study is available on our public GitHub repository
