Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs

Yi Xiao; Van-Hoang Le; Hongyu Zhang

Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs

Yi Xiao, Van-Hoang Le, Hongyu Zhang

TL;DR

The paper tackles the high cost and reliance on demonstrations in LLM-based log parsing for large-scale logs. It introduces LogBatcher, a training-free, demonstration-free framework that partitions logs, caches results, and batches inputs to LLMs to parse logs. Key contributions include a log-specific prompting strategy, TF-IDF vectorization with DBSCAN partitioning, a cache-based matching mechanism, and a batching-query workflow that reduces token usage while maintaining or improving accuracy. Experiments on 16 public LogPai-derived datasets show that LogBatcher achieves state-of-the-art GA/MLA/ED and substantially lowers LLM invocation costs, making practical deployment more feasible.

Abstract

Log parsing, the process of converting raw log messages into structured formats, is an important initial step for automated analysis of logs of large-scale software systems. Traditional log parsers often rely on heuristics or handcrafted features, which may not generalize well across diverse log sources or require extensive model tuning. Recently, some log parsers have utilized powerful generative capabilities of large language models (LLMs). However, they heavily rely on demonstration examples, resulting in substantial overhead in LLM invocations. To address these issues, we propose LogBatcher, a cost-effective LLM-based log parser that requires no training process or labeled data. To leverage latent characteristics of log data and reduce the overhead, we divide logs into several partitions through clustering. Then we perform a cache matching process to match logs with previously parsed log templates. Finally, we provide LLMs with better prompt context specialized for log parsing by batching a group of logs from each partition. We have conducted experiments on 16 public log datasets and the results show that LogBatcher is effective and efficient for log parsing.

Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs

TL;DR

Abstract

Paper Structure (39 sections, 1 equation, 8 figures, 6 tables, 1 algorithm)

This paper contains 39 sections, 1 equation, 8 figures, 6 tables, 1 algorithm.

Introduction
Background and Related Work
Log Parsing
Log Parsing with Large Language Models
A Motivating Example
Methodology
Partitioning
Tokenization
Vectorization
Clustering & Sorting
Caching
Batching -- Querying
Batching
Prompting Design
Post-Processing
...and 24 more sections

Figures (8)

Figure 1: An Illustration of Log Parsing
Figure 2: Selecting in-context demonstrations for log parsing on Spark (Results are produced using gpt-3.5-turbo gpt-3.5-turbo with instruction and demonstrations adopted from jiang2023lilac)
Figure 3: An overview of LogBatcher
Figure 4: Log partitioning through clustering
Figure 5: An illustration of our prompt design
...and 3 more figures

Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs

TL;DR

Abstract

Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (8)