Membership Inference Attack against Large Language Model-based Recommendation Systems: A New Distillation-based Paradigm
Li Cuihong, Huang Xiaowen, Yin Chuanhuan, Sang Jitao
TL;DR
The paper tackles membership inference attacks on LLM-based recommender systems, where shadow-model approaches struggle due to the scale and diversity of training data. It introduces a two-stage knowledge-distillation framework that creates a discriminative reference model and uses fused features from its penultimate layer to train an attack classifier. Experiments across multiple datasets (Last.FM, MovieLens, Book-Crossing, Delicious) and LLMs (T5-base, GPT-2, LLaMA3) show significant performance gains over shadow-based MIAs and single-feature baselines, highlighting practical privacy risks in LLM-driven recommendations. The work demonstrates the value of distillation-based reference models and feature fusion for effective MIAs in complex, large-scale language-model–driven recommendation systems, and suggests avenues for broader applicability and stronger defenses.
Abstract
Membership Inference Attack (MIA) aims to determine whether a specific data sample was included in the training dataset of a target model. Traditional MIA approaches rely on shadow models to mimic target model behavior, but their effectiveness diminishes for Large Language Model (LLM)-based recommendation systems due to the scale and complexity of training data. This paper introduces a novel knowledge distillation-based MIA paradigm tailored for LLM-based recommendation systems. Our method constructs a reference model via distillation, applying distinct strategies for member and non-member data to enhance discriminative capabilities. The paradigm extracts fused features (e.g., confidence, entropy, loss, and hidden layer vectors) from the reference model to train an attack model, overcoming limitations of individual features. Extensive experiments on extended datasets (Last.FM, MovieLens, Book-Crossing, Delicious) and diverse LLMs (T5, GPT-2, LLaMA3) demonstrate that our approach significantly outperforms shadow model-based MIAs and individual-feature baselines. The results show its practicality for privacy attacks in LLM-driven recommender systems.
