Table of Contents
Fetching ...

CORTEX: Collaborative LLM Agents for High-Stakes Alert Triage

Bowen Wei, Yuan Shen Tay, Howard Liu, Jinhao Pan, Kun Luo, Ziwei Zhu, Chris Jordan

TL;DR

This work addresses the challenge of alert fatigue in Security Operations Centers by introducing CORTEX, a multi-agent LLM architecture for high-stakes alert triage. CORTEX splits the investigation into specialized roles—Behavior Analysis, Evidence Acquisition, and Reasoning & Coordination—anchored by a four-stage pipeline and grounded in a typed tool library to produce auditable, evidence-backed decisions. A fine-grained SOC workflow dataset capturing end-to-end investigations enables process-level supervision and robust evaluation. Empirically, CORTEX delivers stronger triage performance (actionable F1 up to $0.78$ from $0.66$) and lower false positives (down to about $14.2 ext{%}$) at the cost of higher latency (median $152.4$ s) and greater token and tool-output volumes ($23{,}600$ tokens) relative to single-agent baselines, demonstrating a favorable accuracy-efficiency trade-off for critical security operations. The work offers a practical blueprint for auditable, role-specialized LLM agents in safety-critical domains and provides a valuable dataset to catalyze further research on reliable, interpretable automated SOC triage.

Abstract

Security Operations Centers (SOCs) are overwhelmed by tens of thousands of daily alerts, with only a small fraction corresponding to genuine attacks. This overload creates alert fatigue, leading to overlooked threats and analyst burnout. Classical detection pipelines are brittle and context-poor, while recent LLM-based approaches typically rely on a single model to interpret logs, retrieve context, and adjudicate alerts end-to-end -- an approach that struggles with noisy enterprise data and offers limited transparency. We propose CORTEX, a multi-agent LLM architecture for high-stakes alert triage in which specialized agents collaborate over real evidence: a behavior-analysis agent inspects activity sequences, evidence-gathering agents query external systems, and a reasoning agent synthesizes findings into an auditable decision. To support training and evaluation, we release a dataset of fine-grained SOC investigations from production environments, capturing step-by-step analyst actions and linked tool outputs. Across diverse enterprise scenarios, CORTEX substantially reduces false positives and improves investigation quality over state-of-the-art single-agent LLMs.

CORTEX: Collaborative LLM Agents for High-Stakes Alert Triage

TL;DR

This work addresses the challenge of alert fatigue in Security Operations Centers by introducing CORTEX, a multi-agent LLM architecture for high-stakes alert triage. CORTEX splits the investigation into specialized roles—Behavior Analysis, Evidence Acquisition, and Reasoning & Coordination—anchored by a four-stage pipeline and grounded in a typed tool library to produce auditable, evidence-backed decisions. A fine-grained SOC workflow dataset capturing end-to-end investigations enables process-level supervision and robust evaluation. Empirically, CORTEX delivers stronger triage performance (actionable F1 up to from ) and lower false positives (down to about ) at the cost of higher latency (median s) and greater token and tool-output volumes ( tokens) relative to single-agent baselines, demonstrating a favorable accuracy-efficiency trade-off for critical security operations. The work offers a practical blueprint for auditable, role-specialized LLM agents in safety-critical domains and provides a valuable dataset to catalyze further research on reliable, interpretable automated SOC triage.

Abstract

Security Operations Centers (SOCs) are overwhelmed by tens of thousands of daily alerts, with only a small fraction corresponding to genuine attacks. This overload creates alert fatigue, leading to overlooked threats and analyst burnout. Classical detection pipelines are brittle and context-poor, while recent LLM-based approaches typically rely on a single model to interpret logs, retrieve context, and adjudicate alerts end-to-end -- an approach that struggles with noisy enterprise data and offers limited transparency. We propose CORTEX, a multi-agent LLM architecture for high-stakes alert triage in which specialized agents collaborate over real evidence: a behavior-analysis agent inspects activity sequences, evidence-gathering agents query external systems, and a reasoning agent synthesizes findings into an auditable decision. To support training and evaluation, we release a dataset of fine-grained SOC investigations from production environments, capturing step-by-step analyst actions and linked tool outputs. Across diverse enterprise scenarios, CORTEX substantially reduces false positives and improves investigation quality over state-of-the-art single-agent LLMs.

Paper Structure

This paper contains 32 sections, 1 figure, 6 tables.

Figures (1)

  • Figure 1: CORTEX architecture. A security alert enters a four-stage pipeline. Stage 1: Orchestrator Agent manages execution and modularity. Stage 2: Behavior Analysis Agent maps alerts to workflows. Stage 3: Evidence Acquisition Agents (workflow-specific) query enterprise tools (e.g., SIEM, identity, asset context) using typed APIs to validate hypotheses. Stage 4: Reasoning & Coordination Agent aggregates workflow outputs, cross-verifies evidence, applies conservative escalation logic, and emits a structured, auditable report with observables and follow-ups.