Hallucination-Resistant Security Planning with a Large Language Model

Kim Hammar; Tansu Alpcan; Emil Lupu

Hallucination-Resistant Security Planning with a Large Language Model

Kim Hammar, Tansu Alpcan, Emil Lupu

TL;DR

The paper tackles hallucination risk in LLM-assisted security planning by embedding the LLM in an iterative verification-refinement loop that generates candidate actions, evaluates consistency via lookahead predictions, and abstains to collect external feedback for in-context learning. It provides theoretical guarantees: a tunable bound on hallucination probability through a consistency threshold and a Bayesian regret bound for in-context learning, along with convergence results. Empirically, the framework reduces incident-response recovery time by up to 30% across four public datasets compared to frontier LLMs, and ablation shows each component (lookahead, ICL, abstention) improves performance. The approach offers a practical, theoretically grounded method for reliable LLM-based decision support in security management with potential to generalize to broader security tasks.

Abstract

Large language models (LLMs) are promising tools for supporting security management tasks, such as incident response planning. However, their unreliability and tendency to hallucinate remain significant challenges. In this paper, we address these challenges by introducing a principled framework for using an LLM as decision support in security management. Our framework integrates the LLM in an iterative loop where it generates candidate actions that are checked for consistency with system constraints and lookahead predictions. When consistency is low, we abstain from the generated actions and instead collect external feedback, e.g., by evaluating actions in a digital twin. This feedback is then used to refine the candidate actions through in-context learning (ICL). We prove that this design allows to control the hallucination risk by tuning the consistency threshold. Moreover, we establish a bound on the regret of ICL under certain assumptions. To evaluate our framework, we apply it to an incident response use case where the goal is to generate a response and recovery plan based on system logs. Experiments on four public datasets show that our framework reduces recovery times by up to 30% compared to frontier LLMs.

Hallucination-Resistant Security Planning with a Large Language Model

TL;DR

Abstract

Paper Structure (21 sections, 3 theorems, 17 equations, 11 figures, 3 tables, 1 algorithm)

This paper contains 21 sections, 3 theorems, 17 equations, 11 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Problem Statement
Our Framework for Hallucination-Resistant Security Planning with an LLM
Framework Overview
Using the LLM to Generate Candidate Actions
Evaluating the Consistency of Candidate Actions
In-Context Learning from Feedback
Theoretical Analysis of Our Framework
Controlling the Hallucination Probability
Convergence of In-Context Learning
Experimental Evaluation
Experimental Setup
Evaluation Results
Discussion of the Evaluation Results
...and 6 more sections

Key Result

Proposition 1

Assume the sets $\{\mathcal{A}^i\}^n_{i=1}$ are independent and identically distributed (i.i.d.). Let $\tilde{\mathcal{A}}$ be a test example from the same distribution and let $\kappa \in (0,1]$ be a desirable upper bound on the hallucination probability. Define the threshold where $\lceil \cdot \rceil$ is the ceiling function. We have

Figures (11)

Figure 1: Our framework for hallucination-resistant security planning with a large language model (LLM). We integrate the LLM in an iterative verification and refinement loop, in which the LLM is used to generate candidate actions that are checked for consistency with lookahead predictions. When consistency is low, we abstain from the generated actions and collect external feedback, which allows to refine the candidate actions through in-context learning (ICL).
Figure 2: Phases and performance metrics of the incident response use case.
Figure 3: Our framework for hallucination-resistant security planning with a large language model (LLM). Our framework integrates the LLM inside an iterative loop of verification and refinement. In this loop, the LLM is prompted with details about a management task and generates candidate actions, which are then evaluated for consistency against lookahead predictions. If the consistency is low, the framework abstains from selecting an action and instead gathers external feedback, e.g., by testing the actions in a digital twin or asking a security expert. This feedback is subsequently leveraged to improve the candidate actions through in-context learning (ICL). The iterative verification and refinement procedure continues until an action that meets the consistency criterion is found.
Figure 4: Architecture for collecting feedback by evaluating the effects of actions using a digital twin, i.e., a virtual replica of the target system 10154288hammar_stadler_tnsm.
Figure 5: Number of occurrences of different mitre att&ck tacticsstrom2018mitre among the incidents in the evaluation datasets; cf. Table \ref{['tab:dataset_types']}.
...and 6 more figures

Theorems & Definitions (11)

Definition 1: Hallucinated action
Remark 1
Proposition 1
proof
Remark 2
Proposition 2
proof
Corollary 1
proof
Remark 3
...and 1 more

Hallucination-Resistant Security Planning with a Large Language Model

TL;DR

Abstract

Hallucination-Resistant Security Planning with a Large Language Model

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (11)