Table of Contents
Fetching ...

In-Context Autonomous Network Incident Response: An End-to-End Large Language Model Agent Approach

Yiran Gao, Kim Hammar, Tao Li

TL;DR

This work tackles the challenge of rapidly responding to evolving cyberattacks by moving beyond simulation-heavy reinforcement learning toward an in-context, end-to-end LLM agent for incident response. It proposes a 14B-parameter LLM that integrates perception, reasoning, planning, and action, trained offline via LoRA on 50k incident logs with chain-of-thought annotations, and capable of online lookahead planning using a world model derived from the LLM's in-context understanding. Planning employs a Monte-Carlo-style rollout to compare candidate actions and re-calibrate attack tactics based on new alerts, yielding a policy that minimizes the recovery cost J(s_0)=\sum_{t=0}^{\tau-1} c(s_t,a_t) with terminal s_T=(1,1,1,1,1,1). On real incident logs, the approach achieves about $23\%$ faster recovery than frontier LLMs, demonstrating end-to-end adaptive response without explicit modeling and with commodity-hardware feasibility. The main limitations are scalability and the need for more realistic cost modeling, prompting future work on cost-efficient simulation and longer-horizon action sequences.

Abstract

Rapidly evolving cyberattacks demand incident response systems that can autonomously learn and adapt to changing threats. Prior work has extensively explored the reinforcement learning approach, which involves learning response strategies through extensive simulation of the incident. While this approach can be effective, it requires handcrafted modeling of the simulator and suppresses useful semantics from raw system logs and alerts. To address these limitations, we propose to leverage large language models' (LLM) pre-trained security knowledge and in-context learning to create an end-to-end agentic solution for incident response planning. Specifically, our agent integrates four functionalities, perception, reasoning, planning, and action, into one lightweight LLM (14b model). Through fine-tuning and chain-of-thought reasoning, our LLM agent is capable of processing system logs and inferring the underlying network state (perception), updating its conjecture of attack models (reasoning), simulating consequences under different response strategies (planning), and generating an effective response (action). By comparing LLM-simulated outcomes with actual observations, the LLM agent repeatedly refines its attack conjecture and corresponding response, thereby demonstrating in-context adaptation. Our agentic approach is free of modeling and can run on commodity hardware. When evaluated on incident logs reported in the literature, our agent achieves recovery up to 23% faster than those of frontier LLMs.

In-Context Autonomous Network Incident Response: An End-to-End Large Language Model Agent Approach

TL;DR

This work tackles the challenge of rapidly responding to evolving cyberattacks by moving beyond simulation-heavy reinforcement learning toward an in-context, end-to-end LLM agent for incident response. It proposes a 14B-parameter LLM that integrates perception, reasoning, planning, and action, trained offline via LoRA on 50k incident logs with chain-of-thought annotations, and capable of online lookahead planning using a world model derived from the LLM's in-context understanding. Planning employs a Monte-Carlo-style rollout to compare candidate actions and re-calibrate attack tactics based on new alerts, yielding a policy that minimizes the recovery cost J(s_0)=\sum_{t=0}^{\tau-1} c(s_t,a_t) with terminal s_T=(1,1,1,1,1,1). On real incident logs, the approach achieves about faster recovery than frontier LLMs, demonstrating end-to-end adaptive response without explicit modeling and with commodity-hardware feasibility. The main limitations are scalability and the need for more realistic cost modeling, prompting future work on cost-efficient simulation and longer-horizon action sequences.

Abstract

Rapidly evolving cyberattacks demand incident response systems that can autonomously learn and adapt to changing threats. Prior work has extensively explored the reinforcement learning approach, which involves learning response strategies through extensive simulation of the incident. While this approach can be effective, it requires handcrafted modeling of the simulator and suppresses useful semantics from raw system logs and alerts. To address these limitations, we propose to leverage large language models' (LLM) pre-trained security knowledge and in-context learning to create an end-to-end agentic solution for incident response planning. Specifically, our agent integrates four functionalities, perception, reasoning, planning, and action, into one lightweight LLM (14b model). Through fine-tuning and chain-of-thought reasoning, our LLM agent is capable of processing system logs and inferring the underlying network state (perception), updating its conjecture of attack models (reasoning), simulating consequences under different response strategies (planning), and generating an effective response (action). By comparing LLM-simulated outcomes with actual observations, the LLM agent repeatedly refines its attack conjecture and corresponding response, thereby demonstrating in-context adaptation. Our agentic approach is free of modeling and can run on commodity hardware. When evaluated on incident logs reported in the literature, our agent achieves recovery up to 23% faster than those of frontier LLMs.
Paper Structure (10 sections, 4 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 10 sections, 4 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of the two stages of our approach. In the first stage [cf. a)], an llm is fine-tuned offline using a dataset of incident logs, each paired with corresponding response plans and chain-of-thought reasoning traces. In the second stage [cf. b)], the fine-tuned llm processes system logs and threat intelligence online to generate $N$ candidate response actions. A planning agent then evaluates these candidates through rollout and in-context adaptation, after which it selects the most effective action.
  • Figure 2: Two example evolutions of the recovery state $s_t$. The first recovery trajectory involves the actions $a_0,a_1,a_2,a_3,a_4, a_5, a_6$ and the second trajectory involves the actions $a_0, a_1, a^{\prime}_2, a_3, a_4, a^{\prime}_5$.
  • Figure 3: Evaluation results ($\downarrow$ better): comparison between our method and frontier llms. Bar colors relate to different methods; bar groups indicate performance metrics; numbers and error bars indicate the mean and the standard deviation from $5$ evaluations with different random seeds.
  • Figure 4: Ablation-study results for the recovery time metric ($\downarrow$ better). Bar groups relate to a specific step of our method; filled bars show the performance with each step and dotted bars show the performance with the step removed; numbers and error bars indicate the mean and the standard deviation from $5$ evaluations with different random seeds.
  • Figure 5: An incidence example from GARCIA2014100 (top) and the prompt for action generation (bottom).
  • ...and 1 more figures