Table of Contents
Fetching ...

Effects of Using Synthetic Data on Deep Recommender Models' Performance

Fatih Cihan Taskin, Ilknur Akcay, Muhammed Pesen, Said Aldemir, Ipek Iraz Esin, Furkan Durmus

TL;DR

The results show that the inclusion of generated negative samples consistently improves the Area Under the Curve (AUC) scores, highlighting the potential of data augmentation strategies to address issues of data sparsity and imbalance, ultimately leading to improved performance of recommender systems.

Abstract

Recommender systems are essential for enhancing user experiences by suggesting items based on individual preferences. However, these systems frequently face the challenge of data imbalance, characterized by a predominance of negative interactions over positive ones. This imbalance can result in biased recommendations favoring popular items. This study investigates the effectiveness of synthetic data generation in addressing data imbalances within recommender systems. Six different methods were used to generate synthetic data. Our experimental approach involved generating synthetic data using these methods and integrating the generated samples into the original dataset. Our results show that the inclusion of generated negative samples consistently improves the Area Under the Curve (AUC) scores. The significant impact of synthetic negative samples highlights the potential of data augmentation strategies to address issues of data sparsity and imbalance, ultimately leading to improved performance of recommender systems.

Effects of Using Synthetic Data on Deep Recommender Models' Performance

TL;DR

The results show that the inclusion of generated negative samples consistently improves the Area Under the Curve (AUC) scores, highlighting the potential of data augmentation strategies to address issues of data sparsity and imbalance, ultimately leading to improved performance of recommender systems.

Abstract

Recommender systems are essential for enhancing user experiences by suggesting items based on individual preferences. However, these systems frequently face the challenge of data imbalance, characterized by a predominance of negative interactions over positive ones. This imbalance can result in biased recommendations favoring popular items. This study investigates the effectiveness of synthetic data generation in addressing data imbalances within recommender systems. Six different methods were used to generate synthetic data. Our experimental approach involved generating synthetic data using these methods and integrating the generated samples into the original dataset. Our results show that the inclusion of generated negative samples consistently improves the Area Under the Curve (AUC) scores. The significant impact of synthetic negative samples highlights the potential of data augmentation strategies to address issues of data sparsity and imbalance, ultimately leading to improved performance of recommender systems.

Paper Structure

This paper contains 6 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: A real dataset is augmented with synthetic data generated from the original dataset to enhance the training process. This combined dataset undergoes an embedding operation to transform sparse features to embedding vectors. Following this, the input embedding matrix is fed into the CTR prediction model, which uses the embedded information to predict the likelihood of a user clicking on an advertisement by capturing complex relationships among the features.