Table of Contents
Fetching ...

LogPTR: Variable-Aware Log Parsing with Pointer Network

Yifan Wu, Bingxu Chai, Siyu Yu, Ying Li, Pinjia He, Wei Jiang, Jianguo Li

TL;DR

This work tackles the challenge of scalable log parsing in the presence of variable categories by introducing LogPTR, an end-to-end variable-aware parser that uses a pointer network to copy words from log messages and label tokens to indicate variable categories. The architecture combines WordPiece subword tokenization, a Bi-LSTM encoder, and a pointer-based decoder trained with maximum likelihood to produce a log template with categorized variables. Evaluated on 16 public LogHub datasets, LogPTR achieves state-of-the-art performance for general parsing (GA ≈ 0.989, PA ≈ 0.972) and superior variable-aware parsing (average PA ≈ 0.972) with robust cross-dataset behavior. Importantly, LogPTR requires no handcrafted rules and uses a fixed set of hyperparameters across datasets, enabling rapid adaptation to new log formats and more reliable automated log analysis in practice.

Abstract

Due to the sheer size of software logs, developers rely on automated log analysis. Log parsing, which parses semi-structured logs into a structured format, is a prerequisite of automated log analysis. However, existing log parsers are unsatisfactory when applied in practice because: 1) they ignore categories of variables, and 2) have poor generalization ability. To address the limitations of existing approaches, we propose LogPTR, the first end-to-end variable-aware log parser that can extract the static and dynamic parts in logs, and further identify the categories of variables. The key of LogPTR is using pointer network to copy words from the log message. We have performed extensive experiments on 16 public log datasets and the results show that LogPTR outperforms state-of-the-art log parsers both on general log parsing that extracts the log template and variable-aware log parsing that further identifies the category of variables.

LogPTR: Variable-Aware Log Parsing with Pointer Network

TL;DR

This work tackles the challenge of scalable log parsing in the presence of variable categories by introducing LogPTR, an end-to-end variable-aware parser that uses a pointer network to copy words from log messages and label tokens to indicate variable categories. The architecture combines WordPiece subword tokenization, a Bi-LSTM encoder, and a pointer-based decoder trained with maximum likelihood to produce a log template with categorized variables. Evaluated on 16 public LogHub datasets, LogPTR achieves state-of-the-art performance for general parsing (GA ≈ 0.989, PA ≈ 0.972) and superior variable-aware parsing (average PA ≈ 0.972) with robust cross-dataset behavior. Importantly, LogPTR requires no handcrafted rules and uses a fixed set of hyperparameters across datasets, enabling rapid adaptation to new log formats and more reliable automated log analysis in practice.

Abstract

Due to the sheer size of software logs, developers rely on automated log analysis. Log parsing, which parses semi-structured logs into a structured format, is a prerequisite of automated log analysis. However, existing log parsers are unsatisfactory when applied in practice because: 1) they ignore categories of variables, and 2) have poor generalization ability. To address the limitations of existing approaches, we propose LogPTR, the first end-to-end variable-aware log parser that can extract the static and dynamic parts in logs, and further identify the categories of variables. The key of LogPTR is using pointer network to copy words from the log message. We have performed extensive experiments on 16 public log datasets and the results show that LogPTR outperforms state-of-the-art log parsers both on general log parsing that extracts the log template and variable-aware log parsing that further identifies the category of variables.
Paper Structure (12 sections, 2 equations, 3 figures, 2 tables)

This paper contains 12 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: An example of log parsing from Spark.
  • Figure 2: The model architecture of LogPTR.
  • Figure 3: Robustness comparison with the state-of-the-art log parsers across different log datasets.