Table of Contents
Fetching ...

Using Large Language Models for Template Detection from Security Event Logs

Risto Vaarandi, Hayretdin Bahsi

TL;DR

The paper addresses the challenge of unsupervised template detection in unstructured security event logs by introducing LLM-TD, a method that uses small local LLMs to discover multiple templates from log batches in an unsupervised, in-context learning setup. It demonstrates that LLM-TD can compete with traditional template miners like Drain on five Linux syslog datasets, with OpenChat typically delivering the best balance of accuracy and speed, and highlights the method's privacy advantages since it operates on local models. A key contribution is the two-pass processing and batch-based prompting strategy that allows multiple templates to be inferred per query, along with heuristic evaluation principles (P1 and P2) to better assess template correctness beyond conventional metrics. The work also provides public datasets and open-source tooling, and emphasizes qualitative analysis of detected templates to uncover actionable insights and potentially new knowledge from log data.

Abstract

In modern IT systems and computer networks, real-time and offline event log analysis is a crucial part of cyber security monitoring. In particular, event log analysis techniques are essential for the timely detection of cyber attacks and for assisting security experts with the analysis of past security incidents. The detection of line patterns or templates from unstructured textual event logs has been identified as an important task of event log analysis since detected templates represent event types in the event log and prepare the logs for downstream online or offline security monitoring tasks. During the last two decades, a number of template mining algorithms have been proposed. However, many proposed algorithms rely on traditional data mining techniques, and the usage of Large Language Models (LLMs) has received less attention so far. Also, most approaches that harness LLMs are supervised, and unsupervised LLM-based template mining remains an understudied area. The current paper addresses this research gap and investigates the application of LLMs for unsupervised detection of templates from unstructured security event logs.

Using Large Language Models for Template Detection from Security Event Logs

TL;DR

The paper addresses the challenge of unsupervised template detection in unstructured security event logs by introducing LLM-TD, a method that uses small local LLMs to discover multiple templates from log batches in an unsupervised, in-context learning setup. It demonstrates that LLM-TD can compete with traditional template miners like Drain on five Linux syslog datasets, with OpenChat typically delivering the best balance of accuracy and speed, and highlights the method's privacy advantages since it operates on local models. A key contribution is the two-pass processing and batch-based prompting strategy that allows multiple templates to be inferred per query, along with heuristic evaluation principles (P1 and P2) to better assess template correctness beyond conventional metrics. The work also provides public datasets and open-source tooling, and emphasizes qualitative analysis of detected templates to uncover actionable insights and potentially new knowledge from log data.

Abstract

In modern IT systems and computer networks, real-time and offline event log analysis is a crucial part of cyber security monitoring. In particular, event log analysis techniques are essential for the timely detection of cyber attacks and for assisting security experts with the analysis of past security incidents. The detection of line patterns or templates from unstructured textual event logs has been identified as an important task of event log analysis since detected templates represent event types in the event log and prepare the logs for downstream online or offline security monitoring tasks. During the last two decades, a number of template mining algorithms have been proposed. However, many proposed algorithms rely on traditional data mining techniques, and the usage of Large Language Models (LLMs) has received less attention so far. Also, most approaches that harness LLMs are supervised, and unsupervised LLM-based template mining remains an understudied area. The current paper addresses this research gap and investigates the application of LLMs for unsupervised detection of templates from unstructured security event logs.
Paper Structure (12 sections, 1 equation, 7 figures, 6 tables, 1 algorithm)

This paper contains 12 sections, 1 equation, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: Example syslog messages and their templates
  • Figure 2: Interaction between LLM-TD and the underlying LLM (QueryLLMforTemplates and Merge procedures from Algorithm \ref{['llm-td']})
  • Figure 3: An example event log with detected templates and ground truth
  • Figure 4: Inferring ground truth templates from insufficient log data
  • Figure 5: Types of incorrectly detected templates
  • ...and 2 more figures