CURE4Rec: A Benchmark for Recommendation Unlearning with Deeper Influence
Chaochao Chen, Jiaming Zhang, Yizhao Zhang, Li Zhang, Lingjuan Lyu, Yuyuan Li, Biao Gong, Chenggang Yan
TL;DR
The paper tackles privacy concerns surrounding data deletion rights in recommender systems by introducing CURE4Rec, the first comprehensive benchmark for evaluation of recommendation unlearning. It defines four evaluation aspects—unlearning completeness, recommendation utility, unlearning efficiency, and recommendation fairness—across three unlearning-set strategies (core data, edge data, random data), enabling robust comparisons between EU (exact) and AU (approximate) methods. The study reveals that EU methods guarantee completeness but can degrade utility and fairness, while AU methods like SCIF improve efficiency and fairness with a modest trade-off in completeness, highlighting important design trade-offs. These findings guide future development of unlearning techniques and evaluation protocols, illustrating that fairness and robustness considerations must be incorporated into practical unlearning systems. The authors also provide code and datasets to facilitate public benchmarking and reproducibility.
Abstract
With increasing privacy concerns in artificial intelligence, regulations have mandated the right to be forgotten, granting individuals the right to withdraw their data from models. Machine unlearning has emerged as a potential solution to enable selective forgetting in models, particularly in recommender systems where historical data contains sensitive user information. Despite recent advances in recommendation unlearning, evaluating unlearning methods comprehensively remains challenging due to the absence of a unified evaluation framework and overlooked aspects of deeper influence, e.g., fairness. To address these gaps, we propose CURE4Rec, the first comprehensive benchmark for recommendation unlearning evaluation. CURE4Rec covers four aspects, i.e., unlearning Completeness, recommendation Utility, unleaRning efficiency, and recommendation fairnEss, under three data selection strategies, i.e., core data, edge data, and random data. Specifically, we consider the deeper influence of unlearning on recommendation fairness and robustness towards data with varying impact levels. We construct multiple datasets with CURE4Rec evaluation and conduct extensive experiments on existing recommendation unlearning methods. Our code is released at https://github.com/xiye7lai/CURE4Rec.
