Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures (XAMT)
Akhil Sharma, Shaikh Yaser Arafat, Jai Kumar Sharma, Ken Huang
TL;DR
This work introduces XAMT, a unified bilevel optimization framework for covert memory tampering in heterogeneous multi-agent architectures that couple MARL and RAG systems. It formalizes a minimal-perturbation constraint R(δ) and derives two instantiations, XAMT-RL and XAMT-RAG, to hijack centralized memory components while evading detection. The authors provide rigorous mathematical formulations, differentiable solution strategies, and comprehensive evaluation protocols on SMAC and SafeRAG to demonstrate the viability of sub-percent poison rates achieving substantial target-impact metrics. The study highlights a new class of training-time threats that challenge trust, verification, and intrinsic safety in MAS, and discusses defense strategies including adaptive, multi-modal defenses and memory resilience mechanisms. Together, these contributions chart a path toward intrinsically safer MAS by foregrounding memory-centric vulnerabilities and the need for robust, scalable defenses beyond perimeter-based detection.
Abstract
The increasing operational reliance on complex Multi-Agent Systems (MAS) across safety-critical domains necessitates rigorous adversarial robustness assessment. Modern MAS are inherently heterogeneous, integrating conventional Multi-Agent Reinforcement Learning (MARL) with emerging Large Language Model (LLM) agent architectures utilizing Retrieval-Augmented Generation (RAG). A critical shared vulnerability is reliance on centralized memory components: the shared Experience Replay (ER) buffer in MARL and the external Knowledge Base (K) in RAG agents. This paper proposes XAMT (Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures), a novel framework that formalizes attack generation as a bilevel optimization problem. The Upper Level minimizes perturbation magnitude (delta) to enforce covertness while maximizing system behavior divergence toward an adversary-defined target (Lower Level). We provide rigorous mathematical instantiations for CTDE MARL algorithms and RAG-based LLM agents, demonstrating that bilevel optimization uniquely crafts stealthy, minimal-perturbation poisons evading detection heuristics. Comprehensive experimental protocols utilize SMAC and SafeRAG benchmarks to quantify effectiveness at sub-percent poison rates (less than or equal to 1 percent in MARL, less than or equal to 0.1 percent in RAG). XAMT defines a new unified class of training-time threats essential for developing intrinsically secure MAS, with implications for trust, formal verification, and defensive strategies prioritizing intrinsic safety over perimeter-based detection.
