Table of Contents
Fetching ...

Face It Yourselves: An LLM-Based Two-Stage Strategy to Localize Configuration Errors via Logs

Shiwen Shan, Yintong Huo, Yuxin Su, Yichen Li, Dan Li, Zibin Zheng

TL;DR

Misconfigurations in configurable software are common and hard to diagnose from code alone. The authors propose a two-stage strategy based on Large Language Models to localize root-cause configuration properties from end-user logs, implemented in LogConfigLocalizer. They validate the approach on Hadoop, achieving an average diagnostic accuracy of $99.91\%$, and illustrate feasibility through a practical case study with $93.94\%$ accuracy. The work offers a practical end-user-focused solution that does not require access to source code, potentially reducing the time and effort required to remedy misconfigurations.

Abstract

Configurable software systems are prone to configuration errors, resulting in significant losses to companies. However, diagnosing these errors is challenging due to the vast and complex configuration space. These errors pose significant challenges for both experienced maintainers and new end-users, particularly those without access to the source code of the software systems. Given that logs are easily accessible to most end-users, we conduct a preliminary study to outline the challenges and opportunities of utilizing logs in localizing configuration errors. Based on the insights gained from the preliminary study, we propose an LLM-based two-stage strategy for end-users to localize the root-cause configuration properties based on logs. We further implement a tool, LogConfigLocalizer, aligned with the design of the aforementioned strategy, hoping to assist end-users in coping with configuration errors through log analysis. To the best of our knowledge, this is the first work to localize the root-cause configuration properties for end-users based on Large Language Models~(LLMs) and logs. We evaluate the proposed strategy on Hadoop by LogConfigLocalizer and prove its efficiency with an average accuracy as high as 99.91%. Additionally, we also demonstrate the effectiveness and necessity of different phases of the methodology by comparing it with two other variants and a baseline tool. Moreover, we validate the proposed methodology through a practical case study to demonstrate its effectiveness and feasibility.

Face It Yourselves: An LLM-Based Two-Stage Strategy to Localize Configuration Errors via Logs

TL;DR

Misconfigurations in configurable software are common and hard to diagnose from code alone. The authors propose a two-stage strategy based on Large Language Models to localize root-cause configuration properties from end-user logs, implemented in LogConfigLocalizer. They validate the approach on Hadoop, achieving an average diagnostic accuracy of , and illustrate feasibility through a practical case study with accuracy. The work offers a practical end-user-focused solution that does not require access to source code, potentially reducing the time and effort required to remedy misconfigurations.

Abstract

Configurable software systems are prone to configuration errors, resulting in significant losses to companies. However, diagnosing these errors is challenging due to the vast and complex configuration space. These errors pose significant challenges for both experienced maintainers and new end-users, particularly those without access to the source code of the software systems. Given that logs are easily accessible to most end-users, we conduct a preliminary study to outline the challenges and opportunities of utilizing logs in localizing configuration errors. Based on the insights gained from the preliminary study, we propose an LLM-based two-stage strategy for end-users to localize the root-cause configuration properties based on logs. We further implement a tool, LogConfigLocalizer, aligned with the design of the aforementioned strategy, hoping to assist end-users in coping with configuration errors through log analysis. To the best of our knowledge, this is the first work to localize the root-cause configuration properties for end-users based on Large Language Models~(LLMs) and logs. We evaluate the proposed strategy on Hadoop by LogConfigLocalizer and prove its efficiency with an average accuracy as high as 99.91%. Additionally, we also demonstrate the effectiveness and necessity of different phases of the methodology by comparing it with two other variants and a baseline tool. Moreover, we validate the proposed methodology through a practical case study to demonstrate its effectiveness and feasibility.
Paper Structure (33 sections, 2 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 33 sections, 2 equations, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: Two Types of Anomaly Symptoms in Logs
  • Figure 2: Overview of the LLM-based Two-Stage Strategy
  • Figure 3: System Prompt in the Verification Phase
  • Figure 4: System Prompt in the Indirect Inference Phase
  • Figure 5: Comparison with ConfDiagDetector based on five workloads, the metric, Other-Hit, denotes LLM-Hit Counts for LogConfigLocalizer and NLP-Hit Counts for ConfDiagDetector respectively.
  • ...and 1 more figures