Table of Contents
Fetching ...

ShieldNet: Network-Level Guardrails against Emerging Supply-Chain Injections in Agentic Systems

Zhuowen Yuan, Zhaorun Chen, Zhen Xiang, Nathaniel D. Bastian, Seyyed Hadi Hashemi, Chaowei Xiao, Wenbo Guo, Bo Li

Abstract

Existing research on LLM agent security mainly focuses on prompt injection and unsafe input/output behaviors. However, as agents increasingly rely on third-party tools and MCP servers, a new class of supply-chain threats has emerged, where malicious behaviors are embedded in seemingly benign tools, silently hijacking agent execution, leaking sensitive data, or triggering unauthorized actions. Despite their growing impact, there is currently no comprehensive benchmark for evaluating such threats. To bridge this gap, we introduce SC-Inject-Bench, a large-scale benchmark comprising over 10,000 malicious MCP tools grounded in a taxonomy of 25+ attack types derived from MITRE ATT&CK targeting supply-chain threats. We observe that existing MCP scanners and semantic guardrails perform poorly on this benchmark. Motivated by this finding, we propose ShieldNet, a network-level guardrail framework that detects supply-chain poisoning by observing real network interactions rather than surface-level tool traces. ShieldNet integrates a man-in-the-middle (MITM) proxy and an event extractor to identify critical network behaviors, which are then processed by a lightweight classifier for attack detection. Extensive experiments show that ShieldNet achieves strong detection performance (up to 0.995 F-1 with only 0.8% false positives) while introducing little runtime overhead, substantially outperforming existing MCP scanners and LLM-based guardrails.

ShieldNet: Network-Level Guardrails against Emerging Supply-Chain Injections in Agentic Systems

Abstract

Existing research on LLM agent security mainly focuses on prompt injection and unsafe input/output behaviors. However, as agents increasingly rely on third-party tools and MCP servers, a new class of supply-chain threats has emerged, where malicious behaviors are embedded in seemingly benign tools, silently hijacking agent execution, leaking sensitive data, or triggering unauthorized actions. Despite their growing impact, there is currently no comprehensive benchmark for evaluating such threats. To bridge this gap, we introduce SC-Inject-Bench, a large-scale benchmark comprising over 10,000 malicious MCP tools grounded in a taxonomy of 25+ attack types derived from MITRE ATT&CK targeting supply-chain threats. We observe that existing MCP scanners and semantic guardrails perform poorly on this benchmark. Motivated by this finding, we propose ShieldNet, a network-level guardrail framework that detects supply-chain poisoning by observing real network interactions rather than surface-level tool traces. ShieldNet integrates a man-in-the-middle (MITM) proxy and an event extractor to identify critical network behaviors, which are then processed by a lightweight classifier for attack detection. Extensive experiments show that ShieldNet achieves strong detection performance (up to 0.995 F-1 with only 0.8% false positives) while introducing little runtime overhead, substantially outperforming existing MCP scanners and LLM-based guardrails.

Paper Structure

This paper contains 38 sections, 6 equations, 14 figures, 8 tables.

Figures (14)

  • Figure 1: Comparison between semantic-level and network-level views of tool execution. Stealthy tool injection attacks preserve benign tool interfaces and input/output behavior, evading semantic-layer inspection, while exposing distinctive runtime behaviors only at the network level, which motivates the network-based guardrails.
  • Figure 2: Overview of our network-based MCP guardrail. Raw packet traces and decrypted application-layer traffic are captured and rendered as a time-ordered sequence of network events, which are analyzed by a guardrail model to classify executions as benign or assign specific risk categories. Crucially, the detection pipeline operates solely on network behavior, without relying on tool inputs and outputs, call graphs, or tool metadata, enabling the identification of malicious actions that are otherwise invisible at the semantic layer.
  • Figure 3: End-to-end data curation pipeline for SC-Inject-Bench. We construct a large-scale benchmark targeting diverse agent supply-chain threats, comprising over 10,000 malicious MCP tools grounded in 25+ attack types derived from MITRE ATT&CK. Specifically, we begin by collecting benign MCP servers, integrate verified malicious scripts into tool implementations, execute the tools under agent control, and verify attacks using network-level signals, thereby producing validated benign and malicious execution traces.
  • Figure 4: F-1 scores for multi-class detection. Our method achieves consistently strong performance across almost all classes, whereas baselines show significant variability, indicating weaker generalization to diverse attack behaviors.
  • Figure 5: Real-world streaming detection demo of ShieldNet. Claude Code interacts with injected MCP servers (left), while ShieldNet performs online network interception and sliding-window classification (right), visualizing structured network events and detection results in real time. Malicious tool executions are automatically identified during runtime based on network-level behavior.
  • ...and 9 more figures