In-Context Autonomous Network Incident Response: An End-to-End Large Language Model Agent Approach
Yiran Gao, Kim Hammar, Tao Li
TL;DR
This work tackles the challenge of rapidly responding to evolving cyberattacks by moving beyond simulation-heavy reinforcement learning toward an in-context, end-to-end LLM agent for incident response. It proposes a 14B-parameter LLM that integrates perception, reasoning, planning, and action, trained offline via LoRA on 50k incident logs with chain-of-thought annotations, and capable of online lookahead planning using a world model derived from the LLM's in-context understanding. Planning employs a Monte-Carlo-style rollout to compare candidate actions and re-calibrate attack tactics based on new alerts, yielding a policy that minimizes the recovery cost J(s_0)=\sum_{t=0}^{\tau-1} c(s_t,a_t) with terminal s_T=(1,1,1,1,1,1). On real incident logs, the approach achieves about $23\%$ faster recovery than frontier LLMs, demonstrating end-to-end adaptive response without explicit modeling and with commodity-hardware feasibility. The main limitations are scalability and the need for more realistic cost modeling, prompting future work on cost-efficient simulation and longer-horizon action sequences.
Abstract
Rapidly evolving cyberattacks demand incident response systems that can autonomously learn and adapt to changing threats. Prior work has extensively explored the reinforcement learning approach, which involves learning response strategies through extensive simulation of the incident. While this approach can be effective, it requires handcrafted modeling of the simulator and suppresses useful semantics from raw system logs and alerts. To address these limitations, we propose to leverage large language models' (LLM) pre-trained security knowledge and in-context learning to create an end-to-end agentic solution for incident response planning. Specifically, our agent integrates four functionalities, perception, reasoning, planning, and action, into one lightweight LLM (14b model). Through fine-tuning and chain-of-thought reasoning, our LLM agent is capable of processing system logs and inferring the underlying network state (perception), updating its conjecture of attack models (reasoning), simulating consequences under different response strategies (planning), and generating an effective response (action). By comparing LLM-simulated outcomes with actual observations, the LLM agent repeatedly refines its attack conjecture and corresponding response, thereby demonstrating in-context adaptation. Our agentic approach is free of modeling and can run on commodity hardware. When evaluated on incident logs reported in the literature, our agent achieves recovery up to 23% faster than those of frontier LLMs.
