CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions

Matan Levi; Yair Alluouche; Daniel Ohayon; Anton Puzanov

CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions

Matan Levi, Yair Alluouche, Daniel Ohayon, Anton Puzanov

TL;DR

This work tackles the challenge of adapting LLMs to the cyber-security domain by introducing SecKnowledge, a domain-knowledge-driven instruction dataset created via expert-driven schemas and content-ground synthetic data. CyberPal.AI, a family of security-specialized LLMs, is fine-tuned with SecKnowledge to improve complex security instruction following, threat hunting, and CTI reasoning. To evaluate generalization and domain understanding, the authors present SecKnowledge-Eval, a broad benchmark suite covering MCQA, classification, summarization, and CTI relationship tasks, plus adversarial assessments. Empirically, CyberPal.AI achieves up to 24% improvements on training-aligned tasks and 10% on public cyber-security benchmarks, demonstrating robust domain expertise and potential for practical security analysis and response.

Abstract

Large Language Models (LLMs) have significantly advanced natural language processing (NLP), providing versatile capabilities across various applications. However, their application to complex, domain-specific tasks, such as cyber-security, often faces substantial challenges. In this study, we introduce SecKnowledge and CyberPal.AI to address these challenges and train security-expert LLMs. SecKnowledge is a domain-knowledge-driven cyber-security instruction dataset, meticulously designed using years of accumulated expert knowledge in the domain through a multi-phase generation process. CyberPal.AI refers to a family of LLMs fine-tuned using SecKnowledge, aimed at building security-specialized LLMs capable of answering and following complex security-related instructions. Additionally, we introduce SecKnowledge-Eval, a comprehensive and diverse cyber-security evaluation benchmark, composed of an extensive set of cyber-security tasks we specifically developed to assess LLMs in the field of cyber-security, along with other publicly available security benchmarks. Our results show a significant average improvement of up to 24% over the baseline models, underscoring the benefits of our expert-driven instruction dataset generation process. These findings contribute to the advancement of AI-based cyber-security applications, paving the way for security-expert LLMs that can enhance threat-hunting and investigation processes.

CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions

TL;DR

Abstract

Paper Structure (40 sections, 14 figures, 7 tables)

This paper contains 40 sections, 14 figures, 7 tables.

Introduction
Related Work
General Domains Instruction-Tuning
Domain specific Instruction-Tuning
SecKnowledge: Domain-knowledge driven Cyber-security Instruction dataset
First Generation Step: Domain knowledge-driven instruction generation
Structure-driven instruction generation
Structured LLM-Augmented Instruction Generation
BRON:
Paths Extraction
Derive the Connection Between Direct Nodes
Constructing CoT on Paths
Multi-path CoT
SIEM Rules to TTP Mapping:
Sigma Rules:
...and 25 more sections

Figures (14)

Figure 1: Relationship between different MITRE ATT&CK components.
Figure 2: Example of constructing CoT by utilizing our knowledge of the structure relationships between different components within the MITRE ATT&CK framework. On the left side, there is a template to map from a given malware usage to its corresponding tactic. On the right side, the template is assigned with a specific malware usage and its chain of connections up to the relevant tactic.
Figure 3: BRON high-level graph structure overview
Figure 4: Illustration depicting the construction of a CoT by extracting paths from BRON, enriching them with data and domain knowledge, and using LLM to formulate connections based on the provided information.
Figure 5: Example of generated instruction using our mapping process from SIEM rules to TTPs. The answer is the explanation that was generated in our construction process.
...and 9 more figures

CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions

TL;DR

Abstract

CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions

Authors

TL;DR

Abstract

Table of Contents

Figures (14)