Table of Contents
Fetching ...

OntoLogX: Ontology-Guided Knowledge Graph Extraction from Cybersecurity Logs with Large Language Models

Luca Cotti, Idilio Drago, Anisa Rula, Devis Bianchini, Federico Cerutti

TL;DR

OntoLogX presents an autonomous AI agent that converts raw cybersecurity logs into ontology-grounded knowledge graphs using retrieval-augmented generation and iterative correction. By grounding logs in a lightweight cybersecurity ontology and aligning sessions with MITRE ATT&CK tactics, the approach enables structured, interoperable threat intelligence extraction from heterogeneous logs. Experimental results show that retrieval and correction substantially improve precision and recall, with code-oriented LLMs delivering strong performance in structured log analysis. The work demonstrates practical CTI benefits and highlights scalability and cost as key areas for future optimization and extension to additional log sources.

Abstract

System logs represent a valuable source of Cyber Threat Intelligence (CTI), capturing attacker behaviors, exploited vulnerabilities, and traces of malicious activity. Yet their utility is often limited by lack of structure, semantic inconsistency, and fragmentation across devices and sessions. Extracting actionable CTI from logs therefore requires approaches that can reconcile noisy, heterogeneous data into coherent and interoperable representations. We introduce OntoLogX, an autonomous Artificial Intelligence (AI) agent that leverages Large Language Models (LLMs) to transform raw logs into ontology-grounded Knowledge Graphs (KGs). OntoLogX integrates a lightweight log ontology with Retrieval Augmented Generation (RAG) and iterative correction steps, ensuring that generated KGs are syntactically and semantically valid. Beyond event-level analysis, the system aggregates KGs into sessions and employs a LLM to predict MITRE ATT&CK tactics, linking low-level log evidence to higher-level adversarial objectives. We evaluate OntoLogX on both logs from a public benchmark and a real-world honeypot dataset, demonstrating robust KG generation across multiple KGs backends and accurate mapping of adversarial activity to ATT&CK tactics. Results highlight the benefits of retrieval and correction for precision and recall, the effectiveness of code-oriented models in structured log analysis, and the value of ontology-grounded representations for actionable CTI extraction.

OntoLogX: Ontology-Guided Knowledge Graph Extraction from Cybersecurity Logs with Large Language Models

TL;DR

OntoLogX presents an autonomous AI agent that converts raw cybersecurity logs into ontology-grounded knowledge graphs using retrieval-augmented generation and iterative correction. By grounding logs in a lightweight cybersecurity ontology and aligning sessions with MITRE ATT&CK tactics, the approach enables structured, interoperable threat intelligence extraction from heterogeneous logs. Experimental results show that retrieval and correction substantially improve precision and recall, with code-oriented LLMs delivering strong performance in structured log analysis. The work demonstrates practical CTI benefits and highlights scalability and cost as key areas for future optimization and extension to additional log sources.

Abstract

System logs represent a valuable source of Cyber Threat Intelligence (CTI), capturing attacker behaviors, exploited vulnerabilities, and traces of malicious activity. Yet their utility is often limited by lack of structure, semantic inconsistency, and fragmentation across devices and sessions. Extracting actionable CTI from logs therefore requires approaches that can reconcile noisy, heterogeneous data into coherent and interoperable representations. We introduce OntoLogX, an autonomous Artificial Intelligence (AI) agent that leverages Large Language Models (LLMs) to transform raw logs into ontology-grounded Knowledge Graphs (KGs). OntoLogX integrates a lightweight log ontology with Retrieval Augmented Generation (RAG) and iterative correction steps, ensuring that generated KGs are syntactically and semantically valid. Beyond event-level analysis, the system aggregates KGs into sessions and employs a LLM to predict MITRE ATT&CK tactics, linking low-level log evidence to higher-level adversarial objectives. We evaluate OntoLogX on both logs from a public benchmark and a real-world honeypot dataset, demonstrating robust KG generation across multiple KGs backends and accurate mapping of adversarial activity to ATT&CK tactics. Results highlight the benefits of retrieval and correction for precision and recall, the effectiveness of code-oriented models in structured log analysis, and the value of ontology-grounded representations for actionable CTI extraction.

Paper Structure

This paper contains 26 sections, 1 equation, 12 figures, 6 tables, 1 algorithm.

Figures (12)

  • Figure 1: Methodology for generating a log event , starting from the raw log event and optional context information.
  • Figure 2: Classes and object properties of the OntoLogX ontology. Data properties are omitted for conciseness. Full arrows indicate either rdfs:subClassOf or rdf:subPropertyOf object properties. Colored boxes highlight external classes
  • Figure 3: Hybrid retrieval process.
  • Figure 4: Format of structured output. NodeType, PropertyType, and RelationshipType respectively represent the valid classes, data properties and object properties defined in the ontology.
  • Figure 5: Comparison of G-Eval scores across different configurations using the Qwen3 Coder 32B model.
  • ...and 7 more figures