Table of Contents
Fetching ...

Exploring Backdoor Attack and Defense for LLM-empowered Recommendations

Liangbo Ning, Wenqi Fan, Qing Li

TL;DR

This work reveals a backdoor vulnerability in LLM-empowered recommender systems and introduces BadRec, a poisoning framework that implants triggers into item titles and uses fake users to contaminate training data, achieving near-100% attack success with as little as 1% poisoned data. To counter this threat, the authors propose P-Scanner, an LLM-based poison detector aided by a trigger augmentation agent and iterative adversarial optimization, designed to detect and remove poisoned items with minimal impact on benign recommendations. Extensive experiments on ML1M, LastFM, and STEAM with two victim models (LLaRA and TALLRec) demonstrate both the attack's effectiveness and the defense's robustness across trigger forms (char-, word-, sentence-level). The results highlight the practical significance of integrating domain-specific defense mechanisms with LLM-enabled RecSys to maintain safety and reliability in real-world deployments. Together, BadRec and P-Scanner offer a comprehensive view of backdoor risks and a scalable approach to preserve trustworthiness in LLM-augmented recommendations.

Abstract

The fusion of Large Language Models (LLMs) with recommender systems (RecSys) has dramatically advanced personalized recommendations and drawn extensive attention. Despite the impressive progress, the safety of LLM-based RecSys against backdoor attacks remains largely under-explored. In this paper, we raise a new problem: Can a backdoor with a specific trigger be injected into LLM-based Recsys, leading to the manipulation of the recommendation responses when the backdoor trigger is appended to an item's title? To investigate the vulnerabilities of LLM-based RecSys under backdoor attacks, we propose a new attack framework termed Backdoor Injection Poisoning for RecSys (BadRec). BadRec perturbs the items' titles with triggers and employs several fake users to interact with these items, effectively poisoning the training set and injecting backdoors into LLM-based RecSys. Comprehensive experiments reveal that poisoning just 1% of the training data with adversarial examples is sufficient to successfully implant backdoors, enabling manipulation of recommendations. To further mitigate such a security threat, we propose a universal defense strategy called Poison Scanner (P-Scanner). Specifically, we introduce an LLM-based poison scanner to detect the poisoned items by leveraging the powerful language understanding and rich knowledge of LLMs. A trigger augmentation agent is employed to generate diverse synthetic triggers to guide the poison scanner in learning domain-specific knowledge of the poisoned item detection task. Extensive experiments on three real-world datasets validate the effectiveness of the proposed P-Scanner.

Exploring Backdoor Attack and Defense for LLM-empowered Recommendations

TL;DR

This work reveals a backdoor vulnerability in LLM-empowered recommender systems and introduces BadRec, a poisoning framework that implants triggers into item titles and uses fake users to contaminate training data, achieving near-100% attack success with as little as 1% poisoned data. To counter this threat, the authors propose P-Scanner, an LLM-based poison detector aided by a trigger augmentation agent and iterative adversarial optimization, designed to detect and remove poisoned items with minimal impact on benign recommendations. Extensive experiments on ML1M, LastFM, and STEAM with two victim models (LLaRA and TALLRec) demonstrate both the attack's effectiveness and the defense's robustness across trigger forms (char-, word-, sentence-level). The results highlight the practical significance of integrating domain-specific defense mechanisms with LLM-enabled RecSys to maintain safety and reliability in real-world deployments. Together, BadRec and P-Scanner offer a comprehensive view of backdoor risks and a scalable approach to preserve trustworthiness in LLM-augmented recommendations.

Abstract

The fusion of Large Language Models (LLMs) with recommender systems (RecSys) has dramatically advanced personalized recommendations and drawn extensive attention. Despite the impressive progress, the safety of LLM-based RecSys against backdoor attacks remains largely under-explored. In this paper, we raise a new problem: Can a backdoor with a specific trigger be injected into LLM-based Recsys, leading to the manipulation of the recommendation responses when the backdoor trigger is appended to an item's title? To investigate the vulnerabilities of LLM-based RecSys under backdoor attacks, we propose a new attack framework termed Backdoor Injection Poisoning for RecSys (BadRec). BadRec perturbs the items' titles with triggers and employs several fake users to interact with these items, effectively poisoning the training set and injecting backdoors into LLM-based RecSys. Comprehensive experiments reveal that poisoning just 1% of the training data with adversarial examples is sufficient to successfully implant backdoors, enabling manipulation of recommendations. To further mitigate such a security threat, we propose a universal defense strategy called Poison Scanner (P-Scanner). Specifically, we introduce an LLM-based poison scanner to detect the poisoned items by leveraging the powerful language understanding and rich knowledge of LLMs. A trigger augmentation agent is employed to generate diverse synthetic triggers to guide the poison scanner in learning domain-specific knowledge of the poisoned item detection task. Extensive experiments on three real-world datasets validate the effectiveness of the proposed P-Scanner.

Paper Structure

This paper contains 30 sections, 9 equations, 7 figures, 7 tables, 2 algorithms.

Figures (7)

  • Figure 1: An illustration of backdoor attacks for LLM-empowered RecSys. LLM-based RecSys will activate the backdoor and recommend items with the predefined trigger on their titles to most users regardless of their preferences. For the item without the trigger, RecSys will perform normally.
  • Figure 2: The overall framework of the Backdoor Injection Poisoning for RecSys. Attackers first inject triggers into item titles and generate fake users to interact with these items as adversarial examples. After training on the poisoned training set, LLM-empowered RecSys will learn both knowledge of recommendations and the backdoor.
  • Figure 3: Attack Performance of BadRec for LLM-empowered RecSys (TALLRec).
  • Figure 4: The overall framework of the proposed P-Scanner. The framework consists of three steps: 1) LLM-Empowered Trigger Augmentation generates diverse triggers by introducing a trigger augmentation agent, 2) Policy Optimization for Poison Detection fine-tunes the LLMs for the poisoned item detection task, and 3) Iteratively Adversarial Optimization updates both the poison scanner and trigger augmentation agent to refines their policy.
  • Figure 5: Defense performance on TALLRec (Char-level).
  • ...and 2 more figures