A Survey on Data-Centric Recommender Systems

Riwei Lai; Rui Chen; Chi Zhang

A Survey on Data-Centric Recommender Systems

Riwei Lai, Rui Chen, Chi Zhang

TL;DR

Data-centric recommender systems (Data-Centric RSs) address the data bottleneck by treating data quality and quantity as primary levers for performance. The survey formalizes data definitions and the $D' = f(D)$ data-enhancement pipeline, then maps existing literature to three core data issues: incompleteness, noise, and bias, with representative methods for each. It covers progress in incomplete-data remedies (attribute completion, interaction augmentation), data-denoising strategies, debiasing approaches, and extends discussion to multimodal data, LLMs, AutoML, and transparency. The article also discusses evaluation challenges and outlines future research directions, offering a practical taxonomy and a roadmap for researchers and practitioners.

Abstract

Recommender systems (RSs) have become an essential tool for mitigating information overload in a range of real-world applications. Recent trends in RSs have revealed a major paradigm shift, moving the spotlight from model-centric innovations to data-centric efforts (e.g., improving data quality and quantity). This evolution has given rise to the concept of data-centric recommender systems (Data-Centric RSs), marking a significant development in the field. This survey provides the first systematic overview of Data-Centric RSs, covering 1) the foundational concepts of recommendation data and Data-Centric RSs; 2) three primary issues of recommendation data; 3) recent research developed to address these issues; and 4) several potential future directions of Data-Centric RSs.

A Survey on Data-Centric Recommender Systems

TL;DR

Data-centric recommender systems (Data-Centric RSs) address the data bottleneck by treating data quality and quantity as primary levers for performance. The survey formalizes data definitions and the

data-enhancement pipeline, then maps existing literature to three core data issues: incompleteness, noise, and bias, with representative methods for each. It covers progress in incomplete-data remedies (attribute completion, interaction augmentation), data-denoising strategies, debiasing approaches, and extends discussion to multimodal data, LLMs, AutoML, and transparency. The article also discusses evaluation challenges and outlines future research directions, offering a practical taxonomy and a roadmap for researchers and practitioners.

Abstract

Paper Structure (28 sections, 5 equations, 5 figures, 1 table)

This paper contains 28 sections, 5 equations, 5 figures, 1 table.

Introduction
Formulation
Recommendation Data
Data-Centric RSs
Data Issues
Data Incompleteness
Data Noise
Data Bias
Research Progress
Handling Incomplete Data
Attribute Completion
Interaction Augmentation
Discussion
Handling Noisy Data
Reweighting-Based Denoising
...and 13 more sections

Figures (5)

Figure 1: Three types of recommendation data.
Figure 2: Model-Centric RSs v.s. Data-Centric RSs.
Figure 3: Overview of data issues in RSs.
Figure 4: An illustration of data bias in RSs.
Figure 5: Categorization of data denoising methods in RSs.

A Survey on Data-Centric Recommender Systems

TL;DR

Abstract

A Survey on Data-Centric Recommender Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (5)