LLM-enabled Applications Require System-Level Threat Monitoring
Yedi Zhang, Haoyu Wang, Xianglin Yang, Jin Song Dong, Jun Sun
TL;DR
The paper argues that deploying LLM-enabled applications creates new reliability and security risks that extend beyond model improvements, necessitating system-level threat monitoring akin to EDR for traditional software. It introduces a taxonomy-driven monitoring schema that links fourteen threat categories to concrete monitoring artifacts and audit-logging practices across the end-to-end workflow, including prompt injection, adversarial inputs, response manipulation, DoS, data poisoning, model poisoning, data leakage, cross-context disclosure, memorisation leakage, theft, watermark evasion, drift, misinformation, and misuse. The authors emphasize post-monitoring incident analysis and reveal challenges such as corpus curation for suspicious patterns, context-inspection latency, and limited observability in closed deployments, while offering alternative views like red-teaming and guardrails as complementary approaches. Overall, the work advocates continuous, system-wide telemetry and forensic capabilities as prerequisites for reliable operation and robust incident-response in LLM-enabled applications, enabling timely detection, containment, and recovery.
Abstract
LLM-enabled applications are rapidly reshaping the software ecosystem by using large language models as core reasoning components for complex task execution. This paradigm shift, however, introduces fundamentally new reliability challenges and significantly expands the security attack surface, due to the non-deterministic, learning-driven, and difficult-to-verify nature of LLM behavior. In light of these emerging and unavoidable safety challenges, we argue that such risks should be treated as expected operational conditions rather than exceptional events, necessitating a dedicated incident-response perspective. Consequently, the primary barrier to trustworthy deployment is not further improving model capability but establishing system-level threat monitoring mechanisms that can detect and contextualize security-relevant anomalies after deployment -- an aspect largely underexplored beyond testing or guardrail-based defenses. Accordingly, this position paper advocates systematic and comprehensive monitoring of security threats in LLM-enabled applications as a prerequisite for reliable operation and a foundation for dedicated incident-response frameworks.
