StruPhantom: Evolutionary Injection Attacks on Black-Box Tabular Agents Powered by Large Language Models
Yang Feng, Xudong Pan
TL;DR
StruPhantom addresses the security vulnerability of black-box LLM-driven tabular agents to indirect prompt injection by embedding malicious instructions in structured data. It introduces a constrained Monte Carlo Tree Search–based optimization pipeline, guided by a shadow ReAct agent and an off-topic evaluator, to evolve attack templates that maximize attack success rates. Across CSV, XLSX, XML, and JSON formats—and on real platforms like Doubao and Coze—the optimized templates achieve markedly higher ASR than manual baselines, illustrating concrete security risks for tabular data pipelines. The work underscores the need for robust defenses, including strict input validation, interpretable auditing, and decoupled processing to mitigate IPI threats in industrial LLM-powered tabular systems.
Abstract
The proliferation of autonomous agents powered by large language models (LLMs) has revolutionized popular business applications dealing with tabular data, i.e., tabular agents. Although LLMs are observed to be vulnerable against prompt injection attacks from external data sources, tabular agents impose strict data formats and predefined rules on the attacker's payload, which are ineffective unless the agent navigates multiple layers of structural data to incorporate the payload. To address the challenge, we present a novel attack termed StruPhantom which specifically targets black-box LLM-powered tabular agents. Our attack designs an evolutionary optimization procedure which continually refines attack payloads via the proposed constrained Monte Carlo Tree Search augmented by an off-topic evaluator. StruPhantom helps systematically explore and exploit the weaknesses of target applications to achieve goal hijacking. Our evaluation validates the effectiveness of StruPhantom across various LLM-based agents, including those on real-world platforms, and attack scenarios. Our attack achieves over 50% higher success rates than baselines in enforcing the application's response to contain phishing links or malicious codes.
