Table of Contents
Fetching ...

A Systematic Survey on Debugging Techniques for Machine Learning Systems

Thanh-Dat Nguyen, Haoye Tian, Bach Le, Patanamon Thongtanunam, Shane McIntosh

TL;DR

This systematic survey catalogs ML debugging techniques and maps them to a real-fault taxonomy to reveal how well research addresses practitioners' needs. It constructs a two-tier taxonomy (fault types and debugging methods) via open/validation coding of 96 papers, and extends Humbatova et al.'s fault taxonomy with newly targeted or emerging faults. The study finds that roughly half of identified debugging challenges are addressed in literature, with a majority of real-world issues on GitHub and in practitioner interviews remaining untargeted, underscoring a significant gap between research and practice. It concludes with concrete implications for researchers and framework developers, emphasizing data processing, interpretability, test quality, data bias, framework usability, and standardization as priority areas to advance ML debugging in real-world deployments.

Abstract

Debugging ML software (i.e., the detection, localization and fixing of faults) poses unique challenges compared to traditional software largely due to the probabilistic nature and heterogeneity of its development process. Various methods have been proposed for testing, diagnosing, and repairing ML systems. However, the big picture informing important research directions that really address the dire needs of developers is yet to unfold, leaving several key questions unaddressed: (1) What faults have been targeted in the ML debugging research that fulfill developers needs in practice? (2) How are these faults addressed? (3) What are the challenges in addressing the yet untargeted faults? In this paper, we conduct a systematic study of debugging techniques for machine learning systems. We first collect technical papers focusing on debugging components in machine learning software. We then map these papers to a taxonomy of faults to assess the current state of fault resolution identified in existing literature. Subsequently, we analyze which techniques are used to address specific faults based on the collected papers. This results in a comprehensive taxonomy that aligns faults with their corresponding debugging methods. Finally, we examine previously released transcripts of interviewing developers to identify the challenges in resolving unfixed faults. Our analysis reveals that only 48 percent of the identified ML debugging challenges have been explicitly addressed by researchers, while 46.9 percent remain unresolved or unmentioned. In real world applications, we found that 52.6 percent of issues reported on GitHub and 70.3% of problems discussed in interviews are still unaddressed by research in ML debugging. The study identifies 13 primary challenges in ML debugging, highlighting a significant gap between the identification of ML debugging issues and their resolution in practice.

A Systematic Survey on Debugging Techniques for Machine Learning Systems

TL;DR

This systematic survey catalogs ML debugging techniques and maps them to a real-fault taxonomy to reveal how well research addresses practitioners' needs. It constructs a two-tier taxonomy (fault types and debugging methods) via open/validation coding of 96 papers, and extends Humbatova et al.'s fault taxonomy with newly targeted or emerging faults. The study finds that roughly half of identified debugging challenges are addressed in literature, with a majority of real-world issues on GitHub and in practitioner interviews remaining untargeted, underscoring a significant gap between research and practice. It concludes with concrete implications for researchers and framework developers, emphasizing data processing, interpretability, test quality, data bias, framework usability, and standardization as priority areas to advance ML debugging in real-world deployments.

Abstract

Debugging ML software (i.e., the detection, localization and fixing of faults) poses unique challenges compared to traditional software largely due to the probabilistic nature and heterogeneity of its development process. Various methods have been proposed for testing, diagnosing, and repairing ML systems. However, the big picture informing important research directions that really address the dire needs of developers is yet to unfold, leaving several key questions unaddressed: (1) What faults have been targeted in the ML debugging research that fulfill developers needs in practice? (2) How are these faults addressed? (3) What are the challenges in addressing the yet untargeted faults? In this paper, we conduct a systematic study of debugging techniques for machine learning systems. We first collect technical papers focusing on debugging components in machine learning software. We then map these papers to a taxonomy of faults to assess the current state of fault resolution identified in existing literature. Subsequently, we analyze which techniques are used to address specific faults based on the collected papers. This results in a comprehensive taxonomy that aligns faults with their corresponding debugging methods. Finally, we examine previously released transcripts of interviewing developers to identify the challenges in resolving unfixed faults. Our analysis reveals that only 48 percent of the identified ML debugging challenges have been explicitly addressed by researchers, while 46.9 percent remain unresolved or unmentioned. In real world applications, we found that 52.6 percent of issues reported on GitHub and 70.3% of problems discussed in interviews are still unaddressed by research in ML debugging. The study identifies 13 primary challenges in ML debugging, highlighting a significant gap between the identification of ML debugging issues and their resolution in practice.

Paper Structure

This paper contains 35 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Examples of alignments between debugging techniques and corresponding faults. For each paper, we extract key points where there are mentions of the faults which the paper aims to address and align these faults with the corresponding codes (categories) proposed by Humbatova et al. Humbatova2020taxonomyofrealfault. We expand Humbatova2020taxonomyofrealfault to accommodate newly emerging categories, where necessary, and denote them with black boxes.
  • Figure 2: Taxonomy of alignment: The upper part shows the taxonomy of faults with status, and the lower part describes the taxonomy of debugging techniques. The arrows align the faults with the corresponding debugging techniques.
  • Figure 3: The first column shows the percentage of faults in each category among all faults. The second column represents the ratio in terms of encounters of targeted/untargeted faults on Github issues, and the last column shows the ratio in terms of encounters of targeted/untargeted faults in interviews.
  • Figure 4: Alignment between ML components addressed/targeted by ML debugging research and the number of observed occurrences those faulty components in Github, and in interviews with practitioners
  • Figure 5: Encounter Frequency for each challenge