Optimized Log Parsing with Syntactic Modifications
Nafid Enan, Gias Uddin
TL;DR
This work benchmarks syntax-based versus semantic-based log parsers and single-phase versus two-phase architectures, revealing that syntax-based approaches excel at grouping while semantic-based models excel at template identity but with much higher cost and limited generalization. It introduces SynLog+, a lightweight two-phase template identification module that leverages a syntax-based grouper and regex-driven anonymization to boost parsing accuracy with minimal runtime overhead. Quantitatively, two-phase parsing improves GA, FGA, and FTA across methods, and SynLog+ delivers substantial gains for syntax-based parsers (up to ~236% PA improvement) and moderate gains for semantic-based parsers, while maintaining efficiency. The findings advocate two-phase parsing as a practical, generalizable design and position SynLog+ as an effective, domain-agnostic enhancement for large-scale log analysis workflows.
Abstract
Logs provide valuable insights into system runtime and assist in software development and maintenance. Log parsing, which converts semi-structured log data into structured log data, is often the first step in automated log analysis. Given the wide range of log parsers utilizing diverse techniques, it is essential to evaluate them to understand their characteristics and performance. In this paper, we conduct a comprehensive empirical study comparing syntax- and semantic-based log parsers, as well as single-phase and two-phase parsing architectures. Our experiments reveal that semantic-based methods perform better at identifying the correct templates and syntax-based log parsers are 10 to 1,000 times more efficient and provide better grouping accuracy although they fall short in accurate template identification. Moreover, two-phase architecture consistently improves accuracy compared to single-phase architecture. Based on the findings of this study, we propose SynLog+, a template identification module that acts as the second phase in a two-phase log parsing architecture. SynLog+ improves the parsing accuracy of syntax-based and semantic-based log parsers by 236\% and 20\% on average, respectively, with virtually no additional runtime cost.
