Face It Yourselves: An LLM-Based Two-Stage Strategy to Localize Configuration Errors via Logs

Shiwen Shan; Yintong Huo; Yuxin Su; Yichen Li; Dan Li; Zibin Zheng

Face It Yourselves: An LLM-Based Two-Stage Strategy to Localize Configuration Errors via Logs

Shiwen Shan, Yintong Huo, Yuxin Su, Yichen Li, Dan Li, Zibin Zheng

TL;DR

Misconfigurations in configurable software are common and hard to diagnose from code alone. The authors propose a two-stage strategy based on Large Language Models to localize root-cause configuration properties from end-user logs, implemented in LogConfigLocalizer. They validate the approach on Hadoop, achieving an average diagnostic accuracy of $99.91\%$, and illustrate feasibility through a practical case study with $93.94\%$ accuracy. The work offers a practical end-user-focused solution that does not require access to source code, potentially reducing the time and effort required to remedy misconfigurations.

Abstract

Configurable software systems are prone to configuration errors, resulting in significant losses to companies. However, diagnosing these errors is challenging due to the vast and complex configuration space. These errors pose significant challenges for both experienced maintainers and new end-users, particularly those without access to the source code of the software systems. Given that logs are easily accessible to most end-users, we conduct a preliminary study to outline the challenges and opportunities of utilizing logs in localizing configuration errors. Based on the insights gained from the preliminary study, we propose an LLM-based two-stage strategy for end-users to localize the root-cause configuration properties based on logs. We further implement a tool, LogConfigLocalizer, aligned with the design of the aforementioned strategy, hoping to assist end-users in coping with configuration errors through log analysis. To the best of our knowledge, this is the first work to localize the root-cause configuration properties for end-users based on Large Language Models~(LLMs) and logs. We evaluate the proposed strategy on Hadoop by LogConfigLocalizer and prove its efficiency with an average accuracy as high as 99.91%. Additionally, we also demonstrate the effectiveness and necessity of different phases of the methodology by comparing it with two other variants and a baseline tool. Moreover, we validate the proposed methodology through a practical case study to demonstrate its effectiveness and feasibility.

Face It Yourselves: An LLM-Based Two-Stage Strategy to Localize Configuration Errors via Logs

TL;DR

, and illustrate feasibility through a practical case study with

accuracy. The work offers a practical end-user-focused solution that does not require access to source code, potentially reducing the time and effort required to remedy misconfigurations.

Abstract

Paper Structure (33 sections, 2 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 33 sections, 2 equations, 6 figures, 6 tables, 1 algorithm.

Introduction
Background
Problem Definition
Preliminary Study
Configuration Bug Localization
Overview
Anomaly Identification Stage
Log Parsing
Specific Templates Extraction
Anomaly Degree Calculation
Log Template Recovery
Anomaly Inference Stage
Direct Inference
LLM-powered Verification
LLM-based Indirect Inference
...and 18 more sections

Figures (6)

Figure 1: Two Types of Anomaly Symptoms in Logs
Figure 2: Overview of the LLM-based Two-Stage Strategy
Figure 3: System Prompt in the Verification Phase
Figure 4: System Prompt in the Indirect Inference Phase
Figure 5: Comparison with ConfDiagDetector based on five workloads, the metric, Other-Hit, denotes LLM-Hit Counts for LogConfigLocalizer and NLP-Hit Counts for ConfDiagDetector respectively.
...and 1 more figures

Face It Yourselves: An LLM-Based Two-Stage Strategy to Localize Configuration Errors via Logs

TL;DR

Abstract

Face It Yourselves: An LLM-Based Two-Stage Strategy to Localize Configuration Errors via Logs

Authors

TL;DR

Abstract

Table of Contents

Figures (6)