Table of Contents
Fetching ...

CREST: Improving Interpretability and Effectiveness of Troubleshooting at Ericsson through Criterion-Specific Trouble Report Retrieval

Soroush Javdan, Pragash Krishnamoorthy, Olga Baysal

TL;DR

This work introduces CREST, a criterion-specific ensemble for trouble report retrieval in Ericsson, designed to improve both retrieval effectiveness and interpretability. By training dedicated models for distinct TR observation criteria and aggregating their outputs in a learned, non-negative-weighted fashion, CREST achieves superior performance over single-model baselines in a two-stage IR/RR pipeline, while also providing per-criterion relevance scores that enhance transparency. The study systematically analyzes the impact of individual criteria, demonstrates calibration improvements via Expected Calibration Error, and validates practical usefulness through a pilot user study. Together, these results suggest that criterion-aware retrieval can accelerate fault resolution and support software maintenance in industrial TR workflows, with potential applicability to Retrieval-Augmented Generation and other LLM-based pipelines.

Abstract

The rapid evolution of the telecommunication industry necessitates efficient troubleshooting processes to maintain network reliability, software maintainability, and service quality. Trouble Reports (TRs), which document issues in Ericsson's production system, play a critical role in facilitating the timely resolution of software faults. However, the complexity and volume of TR data, along with the presence of diverse criteria that reflect different aspects of each fault, present challenges for retrieval systems. Building on prior work at Ericsson, which utilized a two-stage workflow, comprising Initial Retrieval (IR) and Re-Ranking (RR) stages, this study investigates different TR observation criteria and their impact on the performance of retrieval models. We propose \textbf{CREST} (\textbf{C}riteria-specific \textbf{R}etrieval via \textbf{E}nsemble of \textbf{S}pecialized \textbf{T}R models), a criterion-driven retrieval approach that leverages specialized models for different TR fields to improve both effectiveness and interpretability, thereby enabling quicker fault resolution and supporting software maintenance. CREST utilizes specialized models trained on specific TR criteria and aggregates their outputs to capture diverse and complementary signals. This approach leads to enhanced retrieval accuracy, better calibration of predicted scores, and improved interpretability by providing relevance scores for each criterion, helping users understand why specific TRs were retrieved. Using a subset of Ericsson's internal TRs, this research demonstrates that criterion-specific models significantly outperform a single model approach across key evaluation metrics. This highlights the importance of all targeted criteria used in this study for optimizing the performance of retrieval systems.

CREST: Improving Interpretability and Effectiveness of Troubleshooting at Ericsson through Criterion-Specific Trouble Report Retrieval

TL;DR

This work introduces CREST, a criterion-specific ensemble for trouble report retrieval in Ericsson, designed to improve both retrieval effectiveness and interpretability. By training dedicated models for distinct TR observation criteria and aggregating their outputs in a learned, non-negative-weighted fashion, CREST achieves superior performance over single-model baselines in a two-stage IR/RR pipeline, while also providing per-criterion relevance scores that enhance transparency. The study systematically analyzes the impact of individual criteria, demonstrates calibration improvements via Expected Calibration Error, and validates practical usefulness through a pilot user study. Together, these results suggest that criterion-aware retrieval can accelerate fault resolution and support software maintenance in industrial TR workflows, with potential applicability to Retrieval-Augmented Generation and other LLM-based pipelines.

Abstract

The rapid evolution of the telecommunication industry necessitates efficient troubleshooting processes to maintain network reliability, software maintainability, and service quality. Trouble Reports (TRs), which document issues in Ericsson's production system, play a critical role in facilitating the timely resolution of software faults. However, the complexity and volume of TR data, along with the presence of diverse criteria that reflect different aspects of each fault, present challenges for retrieval systems. Building on prior work at Ericsson, which utilized a two-stage workflow, comprising Initial Retrieval (IR) and Re-Ranking (RR) stages, this study investigates different TR observation criteria and their impact on the performance of retrieval models. We propose \textbf{CREST} (\textbf{C}riteria-specific \textbf{R}etrieval via \textbf{E}nsemble of \textbf{S}pecialized \textbf{T}R models), a criterion-driven retrieval approach that leverages specialized models for different TR fields to improve both effectiveness and interpretability, thereby enabling quicker fault resolution and supporting software maintenance. CREST utilizes specialized models trained on specific TR criteria and aggregates their outputs to capture diverse and complementary signals. This approach leads to enhanced retrieval accuracy, better calibration of predicted scores, and improved interpretability by providing relevance scores for each criterion, helping users understand why specific TRs were retrieved. Using a subset of Ericsson's internal TRs, this research demonstrates that criterion-specific models significantly outperform a single model approach across key evaluation metrics. This highlights the importance of all targeted criteria used in this study for optimizing the performance of retrieval systems.

Paper Structure

This paper contains 28 sections, 2 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: Overview of the utilized TR recommendation system.
  • Figure 2: An example of the TR observation field with different criteria.
  • Figure 3: Distribution of TR observation criteria based on the token length.
  • Figure 4: Overview of CREST in a two-stage pipeline: bi-encoders (Twin/ColRoBERTa) retrieve top-K candidates and a cross-encoder (monoRoBERTa) re-ranks them. Unlike the baseline criteria-agnostic two-stage workflow shown in Figure \ref{['fig_over']}, CREST adds criterion-specific models whose per-criterion scores are aggregated into the final relevance score.
  • Figure 5: Mockup of the CREST interface showing selectable criteria and both disaggregated (per-criterion) and aggregated relevance scores, enabling configurable focus and clearer rationale for retrieved results.
  • ...and 1 more figures