Table of Contents
Fetching ...

eARCO: Efficient Automated Root Cause Analysis with Prompt Optimization

Drishti Goel, Raghav Magazine, Supriyo Ghosh, Akshay Nambi, Prathamesh Deshpande, Xuchao Zhang, Chetan Bansal, Saravan Rajmohan

TL;DR

This paper tackles root cause analysis (RCA) for incidents in large-scale cloud systems by addressing the cost and rigidity of existing LLM-based RCA methods. It introduces eARCO, a framework that combines automatic prompt optimization via PromptWizard with retrieval-augmented in-context learning, plus a pathway to cost-efficient finetuned small language models. Empirical results on 2,900 Microsoft incidents show a 21% improvement in RCA accuracy for large models and a 13% improvement for finetuned small models, validated by both GPT-4-based automated evaluation and human expert judgments. The work demonstrates practical viability for AI for Operations by delivering substantial RCA improvements with lower inference costs and scalable deployment potential.

Abstract

Root cause analysis (RCA) for incidents in large-scale cloud systems is a complex, knowledge-intensive task that often requires significant manual effort from on-call engineers (OCEs). Improving RCA is vital for accelerating the incident resolution process and reducing service downtime and manual efforts. Recent advancements in Large-Language Models (LLMs) have proven to be effective in solving different stages of the incident management lifecycle including RCA. However, existing LLM-based RCA recommendations typically leverage default finetuning or retrieval augmented generation (RAG) methods with static, manually designed prompts, which lead to sub-optimal recommendations. In this work, we leverage 'PromptWizard', a state-of-the-art prompt optimization technique, to automatically identify the best optimized prompt instruction that is combined with semantically similar historical examples for querying underlying LLMs during inference. Moreover, by utilizing more than 180K historical incident data from Microsoft, we developed cost-effective finetuned small language models (SLMs) for RCA recommendation generation and demonstrate the power of prompt optimization on such domain-adapted models. Our extensive experimental results show that prompt optimization can improve the accuracy of RCA recommendations by 21% and 13% on 3K test incidents over RAG-based LLMs and finetuned SLMs, respectively. Lastly, our human evaluation with incident owners have demonstrated the efficacy of prompt optimization on RCA recommendation tasks. These findings underscore the advantages of incorporating prompt optimization into AI for Operations (AIOps) systems, delivering substantial gains without increasing computational overhead.

eARCO: Efficient Automated Root Cause Analysis with Prompt Optimization

TL;DR

This paper tackles root cause analysis (RCA) for incidents in large-scale cloud systems by addressing the cost and rigidity of existing LLM-based RCA methods. It introduces eARCO, a framework that combines automatic prompt optimization via PromptWizard with retrieval-augmented in-context learning, plus a pathway to cost-efficient finetuned small language models. Empirical results on 2,900 Microsoft incidents show a 21% improvement in RCA accuracy for large models and a 13% improvement for finetuned small models, validated by both GPT-4-based automated evaluation and human expert judgments. The work demonstrates practical viability for AI for Operations by delivering substantial RCA improvements with lower inference costs and scalable deployment potential.

Abstract

Root cause analysis (RCA) for incidents in large-scale cloud systems is a complex, knowledge-intensive task that often requires significant manual effort from on-call engineers (OCEs). Improving RCA is vital for accelerating the incident resolution process and reducing service downtime and manual efforts. Recent advancements in Large-Language Models (LLMs) have proven to be effective in solving different stages of the incident management lifecycle including RCA. However, existing LLM-based RCA recommendations typically leverage default finetuning or retrieval augmented generation (RAG) methods with static, manually designed prompts, which lead to sub-optimal recommendations. In this work, we leverage 'PromptWizard', a state-of-the-art prompt optimization technique, to automatically identify the best optimized prompt instruction that is combined with semantically similar historical examples for querying underlying LLMs during inference. Moreover, by utilizing more than 180K historical incident data from Microsoft, we developed cost-effective finetuned small language models (SLMs) for RCA recommendation generation and demonstrate the power of prompt optimization on such domain-adapted models. Our extensive experimental results show that prompt optimization can improve the accuracy of RCA recommendations by 21% and 13% on 3K test incidents over RAG-based LLMs and finetuned SLMs, respectively. Lastly, our human evaluation with incident owners have demonstrated the efficacy of prompt optimization on RCA recommendation tasks. These findings underscore the advantages of incorporating prompt optimization into AI for Operations (AIOps) systems, delivering substantial gains without increasing computational overhead.

Paper Structure

This paper contains 30 sections, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Optimized prompt instruction identified by PromptWizard.
  • Figure 2: Architecture of the eARCO Framework for Efficient Root Cause Analysis (RCA)