CORTEX: Collaborative LLM Agents for High-Stakes Alert Triage
Bowen Wei, Yuan Shen Tay, Howard Liu, Jinhao Pan, Kun Luo, Ziwei Zhu, Chris Jordan
TL;DR
This work addresses the challenge of alert fatigue in Security Operations Centers by introducing CORTEX, a multi-agent LLM architecture for high-stakes alert triage. CORTEX splits the investigation into specialized roles—Behavior Analysis, Evidence Acquisition, and Reasoning & Coordination—anchored by a four-stage pipeline and grounded in a typed tool library to produce auditable, evidence-backed decisions. A fine-grained SOC workflow dataset capturing end-to-end investigations enables process-level supervision and robust evaluation. Empirically, CORTEX delivers stronger triage performance (actionable F1 up to $0.78$ from $0.66$) and lower false positives (down to about $14.2 ext{%}$) at the cost of higher latency (median $152.4$ s) and greater token and tool-output volumes ($23{,}600$ tokens) relative to single-agent baselines, demonstrating a favorable accuracy-efficiency trade-off for critical security operations. The work offers a practical blueprint for auditable, role-specialized LLM agents in safety-critical domains and provides a valuable dataset to catalyze further research on reliable, interpretable automated SOC triage.
Abstract
Security Operations Centers (SOCs) are overwhelmed by tens of thousands of daily alerts, with only a small fraction corresponding to genuine attacks. This overload creates alert fatigue, leading to overlooked threats and analyst burnout. Classical detection pipelines are brittle and context-poor, while recent LLM-based approaches typically rely on a single model to interpret logs, retrieve context, and adjudicate alerts end-to-end -- an approach that struggles with noisy enterprise data and offers limited transparency. We propose CORTEX, a multi-agent LLM architecture for high-stakes alert triage in which specialized agents collaborate over real evidence: a behavior-analysis agent inspects activity sequences, evidence-gathering agents query external systems, and a reasoning agent synthesizes findings into an auditable decision. To support training and evaluation, we release a dataset of fine-grained SOC investigations from production environments, capturing step-by-step analyst actions and linked tool outputs. Across diverse enterprise scenarios, CORTEX substantially reduces false positives and improves investigation quality over state-of-the-art single-agent LLMs.
