A Survey on Data-Centric Recommender Systems
Riwei Lai, Rui Chen, Chi Zhang
TL;DR
Data-centric recommender systems (Data-Centric RSs) address the data bottleneck by treating data quality and quantity as primary levers for performance. The survey formalizes data definitions and the $D' = f(D)$ data-enhancement pipeline, then maps existing literature to three core data issues: incompleteness, noise, and bias, with representative methods for each. It covers progress in incomplete-data remedies (attribute completion, interaction augmentation), data-denoising strategies, debiasing approaches, and extends discussion to multimodal data, LLMs, AutoML, and transparency. The article also discusses evaluation challenges and outlines future research directions, offering a practical taxonomy and a roadmap for researchers and practitioners.
Abstract
Recommender systems (RSs) have become an essential tool for mitigating information overload in a range of real-world applications. Recent trends in RSs have revealed a major paradigm shift, moving the spotlight from model-centric innovations to data-centric efforts (e.g., improving data quality and quantity). This evolution has given rise to the concept of data-centric recommender systems (Data-Centric RSs), marking a significant development in the field. This survey provides the first systematic overview of Data-Centric RSs, covering 1) the foundational concepts of recommendation data and Data-Centric RSs; 2) three primary issues of recommendation data; 3) recent research developed to address these issues; and 4) several potential future directions of Data-Centric RSs.
