Table of Contents
Fetching ...

A Systematic Literature Review on Explainability for Machine/Deep Learning-based Software Engineering Research

Sicong Cao, Xiaobing Sun, Ratnadira Widyasari, David Lo, Xiaoxue Wu, Lili Bo, Jiale Zhang, Bin Li, Wei Liu, Di Wu, Yixin Chen

TL;DR

The paper tackles the challenge of deploying AI4SE systems by organizing a systematic literature review of 108 primary studies (2012–2024) across 23 SE tasks to map explainability techniques and evaluation practices. It introduces a practical XAI taxonomy tailored to SE, categorizes common explanation approaches (OT, IM, DK, AM, Others) and formats (Numeric, Text, Visualization, Source Code, Rule), and analyzes baselines, benchmarks, and metrics. The study reveals imbalanced coverage across SE tasks, a heavy reliance on out-of-the-box tools, and a lack of standardized evaluation, underscoring a need for task-specific customization and better benchmarks. Finally, it offers guidelines for future work and provides an interactive data site to support reproducibility and community contributions, aiming to accelerate adoption of explainable AI in SE.

Abstract

The remarkable achievements of Artificial Intelligence (AI) algorithms, particularly in Machine Learning (ML) and Deep Learning (DL), have fueled their extensive deployment across multiple sectors, including Software Engineering (SE). However, due to their black-box nature, these promising AI-driven SE models are still far from being deployed in practice. This lack of explainability poses unwanted risks for their applications in critical tasks, such as vulnerability detection, where decision-making transparency is of paramount importance. This paper endeavors to elucidate this interdisciplinary domain by presenting a systematic literature review of approaches that aim to improve the explainability of AI models within the context of SE. The review canvasses work appearing in the most prominent SE & AI conferences and journals, and spans 108 papers across 23 unique SE tasks. Based on three key Research Questions (RQs), we aim to (1) summarize the SE tasks where XAI techniques have shown success to date; (2) classify and analyze different XAI techniques; and (3) investigate existing evaluation approaches. Based on our findings, we identified a set of challenges remaining to be addressed in existing studies, together with a set of guidelines highlighting potential opportunities we deemed appropriate and important for future work.

A Systematic Literature Review on Explainability for Machine/Deep Learning-based Software Engineering Research

TL;DR

The paper tackles the challenge of deploying AI4SE systems by organizing a systematic literature review of 108 primary studies (2012–2024) across 23 SE tasks to map explainability techniques and evaluation practices. It introduces a practical XAI taxonomy tailored to SE, categorizes common explanation approaches (OT, IM, DK, AM, Others) and formats (Numeric, Text, Visualization, Source Code, Rule), and analyzes baselines, benchmarks, and metrics. The study reveals imbalanced coverage across SE tasks, a heavy reliance on out-of-the-box tools, and a lack of standardized evaluation, underscoring a need for task-specific customization and better benchmarks. Finally, it offers guidelines for future work and provides an interactive data site to support reproducibility and community contributions, aiming to accelerate adoption of explainable AI in SE.

Abstract

The remarkable achievements of Artificial Intelligence (AI) algorithms, particularly in Machine Learning (ML) and Deep Learning (DL), have fueled their extensive deployment across multiple sectors, including Software Engineering (SE). However, due to their black-box nature, these promising AI-driven SE models are still far from being deployed in practice. This lack of explainability poses unwanted risks for their applications in critical tasks, such as vulnerability detection, where decision-making transparency is of paramount importance. This paper endeavors to elucidate this interdisciplinary domain by presenting a systematic literature review of approaches that aim to improve the explainability of AI models within the context of SE. The review canvasses work appearing in the most prominent SE & AI conferences and journals, and spans 108 papers across 23 unique SE tasks. Based on three key Research Questions (RQs), we aim to (1) summarize the SE tasks where XAI techniques have shown success to date; (2) classify and analyze different XAI techniques; and (3) investigate existing evaluation approaches. Based on our findings, we identified a set of challenges remaining to be addressed in existing studies, together with a set of guidelines highlighting potential opportunities we deemed appropriate and important for future work.
Paper Structure (26 sections, 11 figures, 3 tables)

This paper contains 26 sections, 11 figures, 3 tables.

Figures (11)

  • Figure 1: General taxonomy of the survey in terms of scope, stage, and portability.
  • Figure 2: Distribution of XAI4SE studies across different SE activities and contribution types.
  • Figure 3: Papers published per year according to SE tasks.
  • Figure 4: XAI technique taxonomy & distribution.
  • Figure 5: Sample decision tree used for explainable re-opened bug prediction.
  • ...and 6 more figures