Table of Contents
Fetching ...

Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures (XAMT)

Akhil Sharma, Shaikh Yaser Arafat, Jai Kumar Sharma, Ken Huang

TL;DR

This work introduces XAMT, a unified bilevel optimization framework for covert memory tampering in heterogeneous multi-agent architectures that couple MARL and RAG systems. It formalizes a minimal-perturbation constraint R(δ) and derives two instantiations, XAMT-RL and XAMT-RAG, to hijack centralized memory components while evading detection. The authors provide rigorous mathematical formulations, differentiable solution strategies, and comprehensive evaluation protocols on SMAC and SafeRAG to demonstrate the viability of sub-percent poison rates achieving substantial target-impact metrics. The study highlights a new class of training-time threats that challenge trust, verification, and intrinsic safety in MAS, and discusses defense strategies including adaptive, multi-modal defenses and memory resilience mechanisms. Together, these contributions chart a path toward intrinsically safer MAS by foregrounding memory-centric vulnerabilities and the need for robust, scalable defenses beyond perimeter-based detection.

Abstract

The increasing operational reliance on complex Multi-Agent Systems (MAS) across safety-critical domains necessitates rigorous adversarial robustness assessment. Modern MAS are inherently heterogeneous, integrating conventional Multi-Agent Reinforcement Learning (MARL) with emerging Large Language Model (LLM) agent architectures utilizing Retrieval-Augmented Generation (RAG). A critical shared vulnerability is reliance on centralized memory components: the shared Experience Replay (ER) buffer in MARL and the external Knowledge Base (K) in RAG agents. This paper proposes XAMT (Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures), a novel framework that formalizes attack generation as a bilevel optimization problem. The Upper Level minimizes perturbation magnitude (delta) to enforce covertness while maximizing system behavior divergence toward an adversary-defined target (Lower Level). We provide rigorous mathematical instantiations for CTDE MARL algorithms and RAG-based LLM agents, demonstrating that bilevel optimization uniquely crafts stealthy, minimal-perturbation poisons evading detection heuristics. Comprehensive experimental protocols utilize SMAC and SafeRAG benchmarks to quantify effectiveness at sub-percent poison rates (less than or equal to 1 percent in MARL, less than or equal to 0.1 percent in RAG). XAMT defines a new unified class of training-time threats essential for developing intrinsically secure MAS, with implications for trust, formal verification, and defensive strategies prioritizing intrinsic safety over perimeter-based detection.

Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures (XAMT)

TL;DR

This work introduces XAMT, a unified bilevel optimization framework for covert memory tampering in heterogeneous multi-agent architectures that couple MARL and RAG systems. It formalizes a minimal-perturbation constraint R(δ) and derives two instantiations, XAMT-RL and XAMT-RAG, to hijack centralized memory components while evading detection. The authors provide rigorous mathematical formulations, differentiable solution strategies, and comprehensive evaluation protocols on SMAC and SafeRAG to demonstrate the viability of sub-percent poison rates achieving substantial target-impact metrics. The study highlights a new class of training-time threats that challenge trust, verification, and intrinsic safety in MAS, and discusses defense strategies including adaptive, multi-modal defenses and memory resilience mechanisms. Together, these contributions chart a path toward intrinsically safer MAS by foregrounding memory-centric vulnerabilities and the need for robust, scalable defenses beyond perimeter-based detection.

Abstract

The increasing operational reliance on complex Multi-Agent Systems (MAS) across safety-critical domains necessitates rigorous adversarial robustness assessment. Modern MAS are inherently heterogeneous, integrating conventional Multi-Agent Reinforcement Learning (MARL) with emerging Large Language Model (LLM) agent architectures utilizing Retrieval-Augmented Generation (RAG). A critical shared vulnerability is reliance on centralized memory components: the shared Experience Replay (ER) buffer in MARL and the external Knowledge Base (K) in RAG agents. This paper proposes XAMT (Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures), a novel framework that formalizes attack generation as a bilevel optimization problem. The Upper Level minimizes perturbation magnitude (delta) to enforce covertness while maximizing system behavior divergence toward an adversary-defined target (Lower Level). We provide rigorous mathematical instantiations for CTDE MARL algorithms and RAG-based LLM agents, demonstrating that bilevel optimization uniquely crafts stealthy, minimal-perturbation poisons evading detection heuristics. Comprehensive experimental protocols utilize SMAC and SafeRAG benchmarks to quantify effectiveness at sub-percent poison rates (less than or equal to 1 percent in MARL, less than or equal to 0.1 percent in RAG). XAMT defines a new unified class of training-time threats essential for developing intrinsically secure MAS, with implications for trust, formal verification, and defensive strategies prioritizing intrinsic safety over perimeter-based detection.

Paper Structure

This paper contains 50 sections, 8 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: XAMT at a glance—effectiveness vs. covertness.Left: Attack Success / Utility Drop versus poison rate ($\rho$, log scale) for MARL (XAMT-RL) and RAG (XAMT-RAG) compared to non-optimized baselines. XAMT achieves high impact at sub-percent $\rho$. Right: Semantic deviation (proxy for detectability in RAG) versus ASR. XAMT-RAG attains higher ASR with lower semantic drift, reflecting the bilevel “minimal perturbation” objective.
  • Figure 2: Conceptual Architecture of the XAMT Bilevel Optimization Framework. The diagram illustrates the nested optimization problem: the Upper Level (Attacker $\mathcal{A}$) minimizes the perturbation magnitude $R(\delta)$ while maximizing adversarial impact $L_{\mathcal{A}}(\theta^*)$, where $\theta^*$ is the resulting system parameter set. The Lower Level (Victim System $\mathcal{S}$) models the routine learning process minimizing $L_{\mathcal{S}}$ on the corrupted memory $\mathcal{M}+\delta$.
  • Figure 3: Heterogeneous Targets of the XAMT Attack. This dual-path diagram contrasts the two system architectures and their common vulnerability: (A) XAMT-RL targets the shared Experience Replay Buffer ($\mathcal{D}$) used by the centralized critic in CTDE MARL. (B) XAMT-RAG targets the external Knowledge Base ($\mathcal{K}$) used by the retriever to augment the LLM agent's generation. In both cases, the perturbation ($\delta$) is covertly injected into the centralized memory layer.
  • Figure 4: Effectiveness and Covertness of XAMT-RL in SMAC. (Left) A learning curve plot comparing the average Win Rate (Utility) vs. Training Steps for: A clean QMIX agent (Baseline), a QMIX agent trained with a uniform random poisoning attack, and a QMIX agent trained with the BO-optimized XAMT-RL attack. This plot is expected to show XAMT achieving high utility drop post-convergence. (Right) A bar chart comparing XAMT-RL performance across different attack types, plotting the achieved Policy Utility Drop against the required Poison Rate ($\rho \le 1\%$) and Perturbation Magnitude ($L_\infty$).
  • Figure 5: Attack Success Rate (ASR) vs. Covertness for XAMT-RAG. (Left) A line graph plotting ASR versus the Poison Rate ($\rho$, typically $\le 0.1\%$). This graph is designed to show XAMT-RAG achieving a significantly higher ASR at extremely low poison rates compared to non-BO baseline RAG poisoning methods. (Right) A scatter plot visualizing the trade-off between semantic covertness (e.g., Perplexity/Semantic Distance) and ASR for different poison text generation strategies, demonstrating XAMT's ability to minimize semantic deviation while maximizing attack success.