Table of Contents
Fetching ...

Triage in Software Engineering: A Systematic Review of Research and Practice

Yongxin Zhao, Shenglin Zhang, Yujia Wu, Yuxin Sun, Yongqian Sun, Dan Pei, Chetan Bansal, Minghua Ma

TL;DR

This paper conducts a comprehensive systematic review of triage in software engineering, examining how bugs, incidents, and alerts are processed, prioritized, and assigned across the full lifecycle. It synthesizes 234 studies (2004–present) to map data processing, prioritization, assignment, and postmortem feedback into an end-to-end triage framework, highlighting open datasets, evaluation metrics, and practical challenges. The review reveals methodological progress (e.g., deduplication, ML/PLM-based prioritization, graph/LLM-enabled assignment) and persistent barriers (data quality, concept drift, scalability, reproducibility), and it documents rising industry collaboration and real-world deployment considerations. The authors propose future directions—multimodal data fusion, knowledge-grounded models, human-in-the-loop learning, and better generalization—aimed at closing the gap between research prototypes and deployable triage pipelines that integrate seamlessly into modern software operations.

Abstract

As modern software systems continue to grow in complexity, triage has become a fundamental process in system operations and maintenance. Triage aims to efficiently prioritize, assign, and assess issues to ensure the reliability of complex environments. The vast amount of heterogeneous data generated by software systems has made effective triage indispensable for maintaining reliability, facilitating maintainability, and enabling rapid issue response. Motivated by these challenges, researchers have devoted extensive effort to advancing triage automation and have achieved significant progress over the past two decades. This survey provides a comprehensive review of 234 papers from 2004 to the present, offering an in-depth examination of the fundamental concepts, system architecture, and problem statement. By comparing the distinct goals of academic and industrial research and by analyzing empirical studies of industrial practices, we identify the major obstacles that limit the practical deployment of triage systems. To assist practitioners in method selection and performance evaluation, we summarize widely adopted open-source datasets and evaluation metrics, providing a unified perspective on the measurement of triage effectiveness. Finally, we outline potential future directions and emerging opportunities to foster a closer integration between academic innovation and industrial application. All reviewed papers and projects are available at https://github.com/AIOps-Lab-NKU/TriageSurvey.

Triage in Software Engineering: A Systematic Review of Research and Practice

TL;DR

This paper conducts a comprehensive systematic review of triage in software engineering, examining how bugs, incidents, and alerts are processed, prioritized, and assigned across the full lifecycle. It synthesizes 234 studies (2004–present) to map data processing, prioritization, assignment, and postmortem feedback into an end-to-end triage framework, highlighting open datasets, evaluation metrics, and practical challenges. The review reveals methodological progress (e.g., deduplication, ML/PLM-based prioritization, graph/LLM-enabled assignment) and persistent barriers (data quality, concept drift, scalability, reproducibility), and it documents rising industry collaboration and real-world deployment considerations. The authors propose future directions—multimodal data fusion, knowledge-grounded models, human-in-the-loop learning, and better generalization—aimed at closing the gap between research prototypes and deployable triage pipelines that integrate seamlessly into modern software operations.

Abstract

As modern software systems continue to grow in complexity, triage has become a fundamental process in system operations and maintenance. Triage aims to efficiently prioritize, assign, and assess issues to ensure the reliability of complex environments. The vast amount of heterogeneous data generated by software systems has made effective triage indispensable for maintaining reliability, facilitating maintainability, and enabling rapid issue response. Motivated by these challenges, researchers have devoted extensive effort to advancing triage automation and have achieved significant progress over the past two decades. This survey provides a comprehensive review of 234 papers from 2004 to the present, offering an in-depth examination of the fundamental concepts, system architecture, and problem statement. By comparing the distinct goals of academic and industrial research and by analyzing empirical studies of industrial practices, we identify the major obstacles that limit the practical deployment of triage systems. To assist practitioners in method selection and performance evaluation, we summarize widely adopted open-source datasets and evaluation metrics, providing a unified perspective on the measurement of triage effectiveness. Finally, we outline potential future directions and emerging opportunities to foster a closer integration between academic innovation and industrial application. All reviewed papers and projects are available at https://github.com/AIOps-Lab-NKU/TriageSurvey.

Paper Structure

This paper contains 66 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: A conceptual map of automated triage research and practice.
  • Figure 2: Analysis of Publication Trends on Triage in Software Engineering.
  • Figure 3: Structure of this survey.
  • Figure 4: The general lifecycle of triage in SE.
  • Figure 5: Publication distribution of distinct venues.