CREST: Improving Interpretability and Effectiveness of Troubleshooting at Ericsson through Criterion-Specific Trouble Report Retrieval
Soroush Javdan, Pragash Krishnamoorthy, Olga Baysal
TL;DR
This work introduces CREST, a criterion-specific ensemble for trouble report retrieval in Ericsson, designed to improve both retrieval effectiveness and interpretability. By training dedicated models for distinct TR observation criteria and aggregating their outputs in a learned, non-negative-weighted fashion, CREST achieves superior performance over single-model baselines in a two-stage IR/RR pipeline, while also providing per-criterion relevance scores that enhance transparency. The study systematically analyzes the impact of individual criteria, demonstrates calibration improvements via Expected Calibration Error, and validates practical usefulness through a pilot user study. Together, these results suggest that criterion-aware retrieval can accelerate fault resolution and support software maintenance in industrial TR workflows, with potential applicability to Retrieval-Augmented Generation and other LLM-based pipelines.
Abstract
The rapid evolution of the telecommunication industry necessitates efficient troubleshooting processes to maintain network reliability, software maintainability, and service quality. Trouble Reports (TRs), which document issues in Ericsson's production system, play a critical role in facilitating the timely resolution of software faults. However, the complexity and volume of TR data, along with the presence of diverse criteria that reflect different aspects of each fault, present challenges for retrieval systems. Building on prior work at Ericsson, which utilized a two-stage workflow, comprising Initial Retrieval (IR) and Re-Ranking (RR) stages, this study investigates different TR observation criteria and their impact on the performance of retrieval models. We propose \textbf{CREST} (\textbf{C}riteria-specific \textbf{R}etrieval via \textbf{E}nsemble of \textbf{S}pecialized \textbf{T}R models), a criterion-driven retrieval approach that leverages specialized models for different TR fields to improve both effectiveness and interpretability, thereby enabling quicker fault resolution and supporting software maintenance. CREST utilizes specialized models trained on specific TR criteria and aggregates their outputs to capture diverse and complementary signals. This approach leads to enhanced retrieval accuracy, better calibration of predicted scores, and improved interpretability by providing relevance scores for each criterion, helping users understand why specific TRs were retrieved. Using a subset of Ericsson's internal TRs, this research demonstrates that criterion-specific models significantly outperform a single model approach across key evaluation metrics. This highlights the importance of all targeted criteria used in this study for optimizing the performance of retrieval systems.
