Table of Contents
Fetching ...

A Survey on Data-Centric Recommender Systems

Riwei Lai, Rui Chen, Chi Zhang

TL;DR

Data-centric recommender systems (Data-Centric RSs) address the data bottleneck by treating data quality and quantity as primary levers for performance. The survey formalizes data definitions and the $D' = f(D)$ data-enhancement pipeline, then maps existing literature to three core data issues: incompleteness, noise, and bias, with representative methods for each. It covers progress in incomplete-data remedies (attribute completion, interaction augmentation), data-denoising strategies, debiasing approaches, and extends discussion to multimodal data, LLMs, AutoML, and transparency. The article also discusses evaluation challenges and outlines future research directions, offering a practical taxonomy and a roadmap for researchers and practitioners.

Abstract

Recommender systems (RSs) have become an essential tool for mitigating information overload in a range of real-world applications. Recent trends in RSs have revealed a major paradigm shift, moving the spotlight from model-centric innovations to data-centric efforts (e.g., improving data quality and quantity). This evolution has given rise to the concept of data-centric recommender systems (Data-Centric RSs), marking a significant development in the field. This survey provides the first systematic overview of Data-Centric RSs, covering 1) the foundational concepts of recommendation data and Data-Centric RSs; 2) three primary issues of recommendation data; 3) recent research developed to address these issues; and 4) several potential future directions of Data-Centric RSs.

A Survey on Data-Centric Recommender Systems

TL;DR

Data-centric recommender systems (Data-Centric RSs) address the data bottleneck by treating data quality and quantity as primary levers for performance. The survey formalizes data definitions and the data-enhancement pipeline, then maps existing literature to three core data issues: incompleteness, noise, and bias, with representative methods for each. It covers progress in incomplete-data remedies (attribute completion, interaction augmentation), data-denoising strategies, debiasing approaches, and extends discussion to multimodal data, LLMs, AutoML, and transparency. The article also discusses evaluation challenges and outlines future research directions, offering a practical taxonomy and a roadmap for researchers and practitioners.

Abstract

Recommender systems (RSs) have become an essential tool for mitigating information overload in a range of real-world applications. Recent trends in RSs have revealed a major paradigm shift, moving the spotlight from model-centric innovations to data-centric efforts (e.g., improving data quality and quantity). This evolution has given rise to the concept of data-centric recommender systems (Data-Centric RSs), marking a significant development in the field. This survey provides the first systematic overview of Data-Centric RSs, covering 1) the foundational concepts of recommendation data and Data-Centric RSs; 2) three primary issues of recommendation data; 3) recent research developed to address these issues; and 4) several potential future directions of Data-Centric RSs.
Paper Structure (28 sections, 5 equations, 5 figures, 1 table)

This paper contains 28 sections, 5 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Three types of recommendation data.
  • Figure 2: Model-Centric RSs v.s. Data-Centric RSs.
  • Figure 3: Overview of data issues in RSs.
  • Figure 4: An illustration of data bias in RSs.
  • Figure 5: Categorization of data denoising methods in RSs.