Table of Contents
Fetching ...

ER-MIA: Black-Box Adversarial Memory Injection Attacks on Long-Term Memory-Augmented Large Language Models

Mitchell Piehl, Zhaohan Xi, Zuobin Xiong, Pan He, Muchao Ye

TL;DR

The first systematic study of black-box adversarial memory injection attacks that target the similarity-based retrieval mechanism in long-term memory-augmented LLMs are presented, revealing security risks that persist across memory designs and application scenarios.

Abstract

Large language models (LLMs) are increasingly augmented with long-term memory systems to overcome finite context windows and enable persistent reasoning across interactions. However, recent research finds that LLMs become more vulnerable because memory provides extra attack surfaces. In this paper, we present the first systematic study of black-box adversarial memory injection attacks that target the similarity-based retrieval mechanism in long-term memory-augmented LLMs. We introduce ER-MIA, a unified framework that exposes this vulnerability and formalizes two realistic attack settings: content-based attacks and question-targeted attacks. In these settings, ER-MIA includes an arsenal of composable attack primitives and ensemble attacks that achieve high success rates under minimal attacker assumptions. Extensive experiments across multiple LLMs and long-term memory systems demonstrate that similarity-based retrieval constitutes a fundamental and system-level vulnerability, revealing security risks that persist across memory designs and application scenarios.

ER-MIA: Black-Box Adversarial Memory Injection Attacks on Long-Term Memory-Augmented Large Language Models

TL;DR

The first systematic study of black-box adversarial memory injection attacks that target the similarity-based retrieval mechanism in long-term memory-augmented LLMs are presented, revealing security risks that persist across memory designs and application scenarios.

Abstract

Large language models (LLMs) are increasingly augmented with long-term memory systems to overcome finite context windows and enable persistent reasoning across interactions. However, recent research finds that LLMs become more vulnerable because memory provides extra attack surfaces. In this paper, we present the first systematic study of black-box adversarial memory injection attacks that target the similarity-based retrieval mechanism in long-term memory-augmented LLMs. We introduce ER-MIA, a unified framework that exposes this vulnerability and formalizes two realistic attack settings: content-based attacks and question-targeted attacks. In these settings, ER-MIA includes an arsenal of composable attack primitives and ensemble attacks that achieve high success rates under minimal attacker assumptions. Extensive experiments across multiple LLMs and long-term memory systems demonstrate that similarity-based retrieval constitutes a fundamental and system-level vulnerability, revealing security risks that persist across memory designs and application scenarios.
Paper Structure (49 sections, 3 equations, 3 figures, 8 tables)

This paper contains 49 sections, 3 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: exploits similarity-based retrieval in long-term memory systems by injecting malicious memories that are highly similar to clean ones in embedding space, leading to corrupted and degraded LLM reasoning without access to model parameters or the memory system internals.
  • Figure 2: Overview of the proposed pipeline exposing the vulnerability of long-term memory system-augmented LLMs. focuses on attack scenarios of content-based ones and question-targeted ones, with an attack arsenal exploiting the similarity-based retrieval in memory systems. The reasoning of victim LLMs will be compromised due to the injected adversarial memory.
  • Figure 3: Qualitative example of co-retrieval between a clean memory and an embedding-close Harsh Instruction adversarial memory, resulting in corrupted reasoning.