Table of Contents
Fetching ...

Membership Inference Attack against Large Language Model-based Recommendation Systems: A New Distillation-based Paradigm

Li Cuihong, Huang Xiaowen, Yin Chuanhuan, Sang Jitao

TL;DR

The paper tackles membership inference attacks on LLM-based recommender systems, where shadow-model approaches struggle due to the scale and diversity of training data. It introduces a two-stage knowledge-distillation framework that creates a discriminative reference model and uses fused features from its penultimate layer to train an attack classifier. Experiments across multiple datasets (Last.FM, MovieLens, Book-Crossing, Delicious) and LLMs (T5-base, GPT-2, LLaMA3) show significant performance gains over shadow-based MIAs and single-feature baselines, highlighting practical privacy risks in LLM-driven recommendations. The work demonstrates the value of distillation-based reference models and feature fusion for effective MIAs in complex, large-scale language-model–driven recommendation systems, and suggests avenues for broader applicability and stronger defenses.

Abstract

Membership Inference Attack (MIA) aims to determine whether a specific data sample was included in the training dataset of a target model. Traditional MIA approaches rely on shadow models to mimic target model behavior, but their effectiveness diminishes for Large Language Model (LLM)-based recommendation systems due to the scale and complexity of training data. This paper introduces a novel knowledge distillation-based MIA paradigm tailored for LLM-based recommendation systems. Our method constructs a reference model via distillation, applying distinct strategies for member and non-member data to enhance discriminative capabilities. The paradigm extracts fused features (e.g., confidence, entropy, loss, and hidden layer vectors) from the reference model to train an attack model, overcoming limitations of individual features. Extensive experiments on extended datasets (Last.FM, MovieLens, Book-Crossing, Delicious) and diverse LLMs (T5, GPT-2, LLaMA3) demonstrate that our approach significantly outperforms shadow model-based MIAs and individual-feature baselines. The results show its practicality for privacy attacks in LLM-driven recommender systems.

Membership Inference Attack against Large Language Model-based Recommendation Systems: A New Distillation-based Paradigm

TL;DR

The paper tackles membership inference attacks on LLM-based recommender systems, where shadow-model approaches struggle due to the scale and diversity of training data. It introduces a two-stage knowledge-distillation framework that creates a discriminative reference model and uses fused features from its penultimate layer to train an attack classifier. Experiments across multiple datasets (Last.FM, MovieLens, Book-Crossing, Delicious) and LLMs (T5-base, GPT-2, LLaMA3) show significant performance gains over shadow-based MIAs and single-feature baselines, highlighting practical privacy risks in LLM-driven recommendations. The work demonstrates the value of distillation-based reference models and feature fusion for effective MIAs in complex, large-scale language-model–driven recommendation systems, and suggests avenues for broader applicability and stronger defenses.

Abstract

Membership Inference Attack (MIA) aims to determine whether a specific data sample was included in the training dataset of a target model. Traditional MIA approaches rely on shadow models to mimic target model behavior, but their effectiveness diminishes for Large Language Model (LLM)-based recommendation systems due to the scale and complexity of training data. This paper introduces a novel knowledge distillation-based MIA paradigm tailored for LLM-based recommendation systems. Our method constructs a reference model via distillation, applying distinct strategies for member and non-member data to enhance discriminative capabilities. The paradigm extracts fused features (e.g., confidence, entropy, loss, and hidden layer vectors) from the reference model to train an attack model, overcoming limitations of individual features. Extensive experiments on extended datasets (Last.FM, MovieLens, Book-Crossing, Delicious) and diverse LLMs (T5, GPT-2, LLaMA3) demonstrate that our approach significantly outperforms shadow model-based MIAs and individual-feature baselines. The results show its practicality for privacy attacks in LLM-driven recommender systems.

Paper Structure

This paper contains 13 sections, 7 equations, 5 figures, 9 tables, 1 algorithm.

Figures (5)

  • Figure 1: An overview of our paradigm.
  • Figure 2: The differences between our paradigm (the lower) and shadow model-based MIA (the upper) and the advantages of our approach.
  • Figure 3: The feature distributions of the reference model (T5) in our paradigm, the model before distillation (raw model) and the shadow model. The blue and red areas in each part represent the distribution of members and non-members respectively. The horizontal axis of each picture represents the value of the feature, and the vertical axis represents the data density corresponding to the feature.
  • Figure 4: The impact of different $\alpha$ values on the performance of our paradigm on the four datasets respectively.
  • Figure 5: Data sample demonstration of the four datasets.