Table of Contents
Fetching ...

CheatAgent: Attacking LLM-Empowered Recommender Systems via LLM Agent

Liang-bo Ning, Shijie Wang, Wenqi Fan, Qing Li, Xin Xu, Hao Chen, Feiran Huang

TL;DR

The paper investigates safety vulnerabilities of LLM-empowered recommender systems under black-box access and introduces CheatAgent, an autonomous LLM-based attacker that perturbs input prompts to degrade recommendations. CheatAgent combines insertion-positioning to identify high-impact token positions with a prompt-tuned LLM perturbation generator, featuring an initial policy generation and a self-reflection policy optimization loop to maximize $\mathcal{L}_{Rec}(\hat{X},Y)$ while preserving semantic similarity. Across three real-world datasets and two victim models, CheatAgent substantially reduces recommendation quality and outperforms baselines, demonstrating a critical security risk in current LLM-driven RecSys and the practical viability of LLM-based attacks under restricted access. The work highlights the need for robust defenses and evaluation protocols to ensure trustworthiness of LLM-empowered recommender systems in real-world deployments.

Abstract

Recently, Large Language Model (LLM)-empowered recommender systems (RecSys) have brought significant advances in personalized user experience and have attracted considerable attention. Despite the impressive progress, the research question regarding the safety vulnerability of LLM-empowered RecSys still remains largely under-investigated. Given the security and privacy concerns, it is more practical to focus on attacking the black-box RecSys, where attackers can only observe the system's inputs and outputs. However, traditional attack approaches employing reinforcement learning (RL) agents are not effective for attacking LLM-empowered RecSys due to the limited capabilities in processing complex textual inputs, planning, and reasoning. On the other hand, LLMs provide unprecedented opportunities to serve as attack agents to attack RecSys because of their impressive capability in simulating human-like decision-making processes. Therefore, in this paper, we propose a novel attack framework called CheatAgent by harnessing the human-like capabilities of LLMs, where an LLM-based agent is developed to attack LLM-Empowered RecSys. Specifically, our method first identifies the insertion position for maximum impact with minimal input modification. After that, the LLM agent is designed to generate adversarial perturbations to insert at target positions. To further improve the quality of generated perturbations, we utilize the prompt tuning technique to improve attacking strategies via feedback from the victim RecSys iteratively. Extensive experiments across three real-world datasets demonstrate the effectiveness of our proposed attacking method.

CheatAgent: Attacking LLM-Empowered Recommender Systems via LLM Agent

TL;DR

The paper investigates safety vulnerabilities of LLM-empowered recommender systems under black-box access and introduces CheatAgent, an autonomous LLM-based attacker that perturbs input prompts to degrade recommendations. CheatAgent combines insertion-positioning to identify high-impact token positions with a prompt-tuned LLM perturbation generator, featuring an initial policy generation and a self-reflection policy optimization loop to maximize while preserving semantic similarity. Across three real-world datasets and two victim models, CheatAgent substantially reduces recommendation quality and outperforms baselines, demonstrating a critical security risk in current LLM-driven RecSys and the practical viability of LLM-based attacks under restricted access. The work highlights the need for robust defenses and evaluation protocols to ensure trustworthiness of LLM-empowered recommender systems in real-world deployments.

Abstract

Recently, Large Language Model (LLM)-empowered recommender systems (RecSys) have brought significant advances in personalized user experience and have attracted considerable attention. Despite the impressive progress, the research question regarding the safety vulnerability of LLM-empowered RecSys still remains largely under-investigated. Given the security and privacy concerns, it is more practical to focus on attacking the black-box RecSys, where attackers can only observe the system's inputs and outputs. However, traditional attack approaches employing reinforcement learning (RL) agents are not effective for attacking LLM-empowered RecSys due to the limited capabilities in processing complex textual inputs, planning, and reasoning. On the other hand, LLMs provide unprecedented opportunities to serve as attack agents to attack RecSys because of their impressive capability in simulating human-like decision-making processes. Therefore, in this paper, we propose a novel attack framework called CheatAgent by harnessing the human-like capabilities of LLMs, where an LLM-based agent is developed to attack LLM-Empowered RecSys. Specifically, our method first identifies the insertion position for maximum impact with minimal input modification. After that, the LLM agent is designed to generate adversarial perturbations to insert at target positions. To further improve the quality of generated perturbations, we utilize the prompt tuning technique to improve attacking strategies via feedback from the victim RecSys iteratively. Extensive experiments across three real-world datasets demonstrate the effectiveness of our proposed attacking method.

Paper Structure

This paper contains 34 sections, 5 equations, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: The illustration of the adversarial attack for recommender systems in the era of LLMs. Attackers leverage the LLM agent to insert some tokens (e.g., words) or items in the user's prompt to manipulate the LLM-empowered recommender system to make incorrect decisions.
  • Figure 2: The overall framework of the proposed CheatAgent. Insertion positioning first locates the token with the maximum impact. Then, LLM agent-empowered perturbation generation is proposed to leverage the LLM as the attacker agent to generate adversarial perturbations. It contains two processes: 1) Initial Policy Generation searches for a great attack policy initialization, and 2) Self-Reflection Policy Optimization fine-tunes the prefix prompt to update the attack policy of the LLM-based agent.
  • Figure 3: Attack performance of different methods (Victim model: TALLRec).
  • Figure 4: The semantic similarity between the benign and adversarial prompts.
  • Figure 5: Effect of the hyper-parameters $k$ and $n$.