Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Zhenxiong Yu; Zhi Yang; Zhiheng Jin; Shuhe Wang; Heng Zhang; Yanlin Fei; Lingfeng Zeng; Fangqi Lou; Shuo Zhang; Tu Hu; Jingping Liu; Rongze Chen; Xingyu Zhu; Kunyi Wang; Chaofa Yuan; Xin Guo; Zhaowei Liu; Feipeng Zhang; Jie Huang; Huacan Wang; Ronghao Chen; Liwen Zhang

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Zhenxiong Yu, Zhi Yang, Zhiheng Jin, Shuhe Wang, Heng Zhang, Yanlin Fei, Lingfeng Zeng, Fangqi Lou, Shuo Zhang, Tu Hu, Jingping Liu, Rongze Chen, Xingyu Zhu, Kunyi Wang, Chaofa Yuan, Xin Guo, Zhaowei Liu, Feipeng Zhang, Jie Huang, Huacan Wang, Ronghao Chen, Liwen Zhang

TL;DR

The paper addresses security risks in autonomous LLM-powered agents, where traditional always-on, external checks impose latency and brittleness across long, multi-step workflows. It introduces Spider-Sense, a framework that embeds Intrinsic Risk Sensing (IRS) into the agent’s execution to enable selective, event-driven defense, and a Hierarchical Adaptive Screening (HAC) that uses fast pattern matching plus deep reasoning only when needed. It also provides S$^2$Bench, a lifecycle-aware benchmark with realistic tool execution and attack scenarios to rigorously evaluate in-situ interception. Empirical results show Spider-Sense achieves state-of-the-art defense performance with the lowest Attack Success Rate (ASR) and False Positive Rate (FPR) while incurring only modest latency overhead, demonstrating practical, scalable protection for real-world agent deployments.

Abstract

As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security challenges. Most existing agent defense mechanisms adopt a mandatory checking paradigm, in which security validation is forcibly triggered at predefined stages of the agent lifecycle. In this work, we argue that effective agent security should be intrinsic and selective rather than architecturally decoupled and mandatory. We propose Spider-Sense framework, an event-driven defense framework based on Intrinsic Risk Sensing (IRS), which allows agents to maintain latent vigilance and trigger defenses only upon risk perception. Once triggered, the Spider-Sense invokes a hierarchical defence mechanism that trades off efficiency and precision: it resolves known patterns via lightweight similarity matching while escalating ambiguous cases to deep internal reasoning, thereby eliminating reliance on external models. To facilitate rigorous evaluation, we introduce S$^2$Bench, a lifecycle-aware benchmark featuring realistic tool execution and multi-stage attacks. Extensive experiments demonstrate that Spider-Sense achieves competitive or superior defense performance, attaining the lowest Attack Success Rate (ASR) and False Positive Rate (FPR), with only a marginal latency overhead of 8.3\%.

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

TL;DR

Bench, a lifecycle-aware benchmark with realistic tool execution and attack scenarios to rigorously evaluate in-situ interception. Empirical results show Spider-Sense achieves state-of-the-art defense performance with the lowest Attack Success Rate (ASR) and False Positive Rate (FPR) while incurring only modest latency overhead, demonstrating practical, scalable protection for real-world agent deployments.

Abstract

Bench, a lifecycle-aware benchmark featuring realistic tool execution and multi-stage attacks. Extensive experiments demonstrate that Spider-Sense achieves competitive or superior defense performance, attaining the lowest Attack Success Rate (ASR) and False Positive Rate (FPR), with only a marginal latency overhead of 8.3\%.

Paper Structure (56 sections, 6 equations, 5 figures, 4 tables)

This paper contains 56 sections, 6 equations, 5 figures, 4 tables.

Introduction
Related Work
LLM-Level Safety Alignment and Guardrails
Agent-Level Defensive Mechanisms
Spider-Sense Framework
Problem Formulation
Overview
Intrinsic Risk Sensing (IRS)
Hierarchical Adaptive Screening
S$^2$Bench Dataset
Multi-stage and Multi-scenario
Authenticity
Hard Benign Prompts
Realistic Attack Simulation
Experiments
...and 41 more sections

Figures (5)

Figure 1: Comparison between the Existing Framework and the Spider-Sense Framework. The existing approach relies on forced, repetitive external security checks at every stage, leading to high latency. In contrast, Spider-Sense utilizes proactive, endogenous risk awareness to dynamically trigger targeted analysis only when anomalies (like suspicious tool outputs) are sensed.
Figure 2: Overview of Spider-Sense. Intrinsic risk sensing operates across all agent stages, while the sensing indicator is triggered only at the observation stage (highlighted by a yellow warning symbol) in this example.
Figure 3: Ablation study on stage-wise risk sensing.
Figure 4: Ablation study on hierarchical adaptive screening.
Figure 5: In-situ interception of a tool-return injection attack at the observation stage using IRS and hierarchical adaptive screening.

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

TL;DR

Abstract

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Authors

TL;DR

Abstract

Table of Contents

Figures (5)