Using Large Language Models for Template Detection from Security Event Logs
Risto Vaarandi, Hayretdin Bahsi
TL;DR
The paper addresses the challenge of unsupervised template detection in unstructured security event logs by introducing LLM-TD, a method that uses small local LLMs to discover multiple templates from log batches in an unsupervised, in-context learning setup. It demonstrates that LLM-TD can compete with traditional template miners like Drain on five Linux syslog datasets, with OpenChat typically delivering the best balance of accuracy and speed, and highlights the method's privacy advantages since it operates on local models. A key contribution is the two-pass processing and batch-based prompting strategy that allows multiple templates to be inferred per query, along with heuristic evaluation principles (P1 and P2) to better assess template correctness beyond conventional metrics. The work also provides public datasets and open-source tooling, and emphasizes qualitative analysis of detected templates to uncover actionable insights and potentially new knowledge from log data.
Abstract
In modern IT systems and computer networks, real-time and offline event log analysis is a crucial part of cyber security monitoring. In particular, event log analysis techniques are essential for the timely detection of cyber attacks and for assisting security experts with the analysis of past security incidents. The detection of line patterns or templates from unstructured textual event logs has been identified as an important task of event log analysis since detected templates represent event types in the event log and prepare the logs for downstream online or offline security monitoring tasks. During the last two decades, a number of template mining algorithms have been proposed. However, many proposed algorithms rely on traditional data mining techniques, and the usage of Large Language Models (LLMs) has received less attention so far. Also, most approaches that harness LLMs are supervised, and unsupervised LLM-based template mining remains an understudied area. The current paper addresses this research gap and investigates the application of LLMs for unsupervised detection of templates from unstructured security event logs.
