Table of Contents
Fetching ...

DrunkAgent: Stealthy Memory Corruption in LLM-Powered Recommender Agents

Shiyi Yang, Zhibo Hu, Xinshu Li, Chen Wang, Tong Yu, Xiwei Xu, Liming Zhu, Lina Yao

TL;DR

This work identifies memory as a critical attack surface in LLM-powered agentic recommender systems and proposes DrunkAgent, a black-box framework that perturbs the target item memory via semantically meaningful textual triggers. DrunkAgent employs a Surrogate Module to simulate victim behavior, a Generation Module to craft high-quality triggers with a greedy search and linguistic refinement, and a Strategy Module to preserve memory perturbations during interactions. Extensive experiments across CF, retrieval-augmented, and sequential agentic RSs on real-world datasets demonstrate strong transferability, universality across target items, and stealthiness, with modest degradation in overall performance and resilience against paraphrasing defenses. The findings highlight a pressing need for memory-aware defenses and robust, trustworthy design of memory-driven recommender agents in practical deployments.

Abstract

Large language model (LLM)-powered agents are increasingly used in recommender systems (RSs) to achieve personalized behavior modeling, where the memory mechanism plays a pivotal role in enabling the agents to autonomously explore, learn and self-evolve from real-world interactions. However, this very mechanism, serving as a contextual repository, inherently exposes an attack surface for potential adversarial manipulations. Despite its central role, the robustness of agentic RSs in the face of such threats remains largely underexplored. Previous works suffer from semantic mismatches or rely on static embeddings or pre-defined prompts, all of which are not designed for dynamic systems, especially for dynamic memory states of LLM agents. This challenge is exacerbated by the black-box nature of commercial recommenders. To tackle the above problems, in this paper, we present the first systematic investigation of memory-based vulnerabilities in LLM-powered recommender agents, revealing their security limitations and guiding efforts to strengthen system resilience and trustworthiness. Specifically, we propose a novel black-box attack framework named DrunkAgent. DrunkAgent crafts semantically meaningful adversarial textual triggers for target item promotions and introduces a series of strategies to maximize the trigger effect by corrupting the memory updates during the interactions. The triggers and strategies are optimized on a surrogate model, enabling DrunkAgent transferable and stealthy. Extensive experiments on real-world datasets across diverse agentic RSs, including collaborative filtering, retrieval augmentation and sequential recommendations, demonstrate the generalizability, transferability and stealthiness of DrunkAgent.

DrunkAgent: Stealthy Memory Corruption in LLM-Powered Recommender Agents

TL;DR

This work identifies memory as a critical attack surface in LLM-powered agentic recommender systems and proposes DrunkAgent, a black-box framework that perturbs the target item memory via semantically meaningful textual triggers. DrunkAgent employs a Surrogate Module to simulate victim behavior, a Generation Module to craft high-quality triggers with a greedy search and linguistic refinement, and a Strategy Module to preserve memory perturbations during interactions. Extensive experiments across CF, retrieval-augmented, and sequential agentic RSs on real-world datasets demonstrate strong transferability, universality across target items, and stealthiness, with modest degradation in overall performance and resilience against paraphrasing defenses. The findings highlight a pressing need for memory-aware defenses and robust, trustworthy design of memory-driven recommender agents in practical deployments.

Abstract

Large language model (LLM)-powered agents are increasingly used in recommender systems (RSs) to achieve personalized behavior modeling, where the memory mechanism plays a pivotal role in enabling the agents to autonomously explore, learn and self-evolve from real-world interactions. However, this very mechanism, serving as a contextual repository, inherently exposes an attack surface for potential adversarial manipulations. Despite its central role, the robustness of agentic RSs in the face of such threats remains largely underexplored. Previous works suffer from semantic mismatches or rely on static embeddings or pre-defined prompts, all of which are not designed for dynamic systems, especially for dynamic memory states of LLM agents. This challenge is exacerbated by the black-box nature of commercial recommenders. To tackle the above problems, in this paper, we present the first systematic investigation of memory-based vulnerabilities in LLM-powered recommender agents, revealing their security limitations and guiding efforts to strengthen system resilience and trustworthiness. Specifically, we propose a novel black-box attack framework named DrunkAgent. DrunkAgent crafts semantically meaningful adversarial textual triggers for target item promotions and introduces a series of strategies to maximize the trigger effect by corrupting the memory updates during the interactions. The triggers and strategies are optimized on a surrogate model, enabling DrunkAgent transferable and stealthy. Extensive experiments on real-world datasets across diverse agentic RSs, including collaborative filtering, retrieval augmentation and sequential recommendations, demonstrate the generalizability, transferability and stealthiness of DrunkAgent.

Paper Structure

This paper contains 44 sections, 7 equations, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: DrunkAgent Overview. The generation module produces adversarial textual triggers for promoting target items. The strategy module creates adversarial strategies to 'get the target agents drunk' to allow the triggers to achieve maximum impact. The triggers and the strategies are optimized on the surrogate module to improve the transferability and stealthiness of black-box attacks.
  • Figure 2: Attack Universality across Target Items.
  • Figure 3: Attack Stealthiness. The overall distribution of recommendation performance differences of all the victim agentic models on all real-world datasets before and after the attacks.
  • Figure 4: Attack Imperceptibility. Perturbed text's perplexity on real datasets.
  • Figure 5: DrunkAgent's Robustness to Defense Mechanisms
  • ...and 1 more figures