LLM-driven Provenance Forensics for Threat Investigation and Detection
Kunal Mukherjee, Murat Kantarcioglu
TL;DR
ProvSEEK tackles scalability and interpretability gaps in provenance-based forensics by introducing an agentic, LLM-powered framework that orchestrates specialized tools and bounded provenance queries. It combines Retrieval-Augmented Generation, chain-of-thought reasoning, and a verification-first design to ground all conclusions in verifiable system evidence. Across seven public DARPA datasets, ProvSEEK outperforms retrieval-based baselines with up to 34% gains in contextual precision/recall for intelligence extraction and 22%/29% gains in precision/recall for threat detection, while maintaining practical latency and token usage. The work demonstrates scalable, interpretable automated provenance analytics with extensive ablations validating each component’s contribution, signaling a path toward deployable, grounded forensics without task-specific environment training.
Abstract
We introduce PROVSEEK, an LLM-powered agentic framework for automated provenance-driven forensic analysis and threat intelligence extraction. PROVSEEK employs specialized toolchains to dynamically retrieve relevant context by generating precise, context-aware queries that fuse knowledge from threat reports with evidence from system provenance data. The framework resolves provenance queries, orchestrates multiple role-specific agents, and synthesizes structured, ground-truth verifiable forensic summaries. By combining agent orchestration with Retrieval-Augmented Generation (RAG) and chain-of-thought (CoT) reasoning, data-guided filtration using a behavioral model, PROVSEEK enables adaptive multi-step analysis that iteratively refines hypotheses, verifies supporting evidence, and produces scalable, interpretable forensic explanations of attack behaviors. PROVSEEK is designed for automated threat investigation without task-specific training data, enabling forensic-style investigation even when no prior knowledge of the environment. We conduct a comprehensive evaluation on publicly available DARPA datasets, demonstrating that PROVSEEK outperforms retrieval-based methods for the intelligence extraction task, achieving a 34% improvement in contextual precision/recall; and for threat detection task, PROVSEEK achieves 22%/29% higher precision/recall compared to both a baseline agent approach and State-Of-The-Art (SOTA) Provenance-based Intrusion Detection System (PIDS). In our scalability study, we show that PROVSEEK increases token usage by 1.42x and latency by 1.63x as the database size increases 50x, making it optimal for large-scale deployment. We also conducted an ablation and error analysis study to show how different components of PROVSEEK affect the detection performance.
