Multi-Resolution Diffusion for Privacy-Sensitive Recommender Systems

Derek Lilienthal; Paul Mello; Magdalini Eirinaki; Stas Tiomkin

Multi-Resolution Diffusion for Privacy-Sensitive Recommender Systems

Derek Lilienthal, Paul Mello, Magdalini Eirinaki, Stas Tiomkin

TL;DR

This work addresses the privacy and data-sparsity challenges of training recommender systems by generating high-quality synthetic datasets. It introduces SDRM, a two-stage approach that maps user-item interactions into a latent Gaussian space via a pretrained MultiVAE and then applies a score-based diffusion model to denoise and sample new data before decoding back to the original space. The method demonstrates substantial improvements over baselines in both augmenting real data and substituting synthetic data, with average recalls and ranking gains (e.g., roughly 4.3% overall Recall@k and 4.6% NDCG@k) while preserving privacy (≈99% dissimilarity to the original data). The combination of a diffusion process with variational inference leverages the strengths of both paradigms to capture intricate user preferences, offering a practical route to privacy-preserving, data-efficient recommender systems. The work establishes diffusion-based synthetic data generation as a viable alternative to traditional privacy techniques, with implications for industry deployments where data sharing is constrained by regulations.

Abstract

While recommender systems have become an integral component of the Web experience, their heavy reliance on user data raises privacy and security concerns. Substituting user data with synthetic data can address these concerns, but accurately replicating these real-world datasets has been a notoriously challenging problem. Recent advancements in generative AI have demonstrated the impressive capabilities of diffusion models in generating realistic data across various domains. In this work we introduce a Score-based Diffusion Recommendation Module (SDRM), which captures the intricate patterns of real-world datasets required for training highly accurate recommender systems. SDRM allows for the generation of synthetic data that can replace existing datasets to preserve user privacy, or augment existing datasets to address excessive data sparsity. Our method outperforms competing baselines such as generative adversarial networks, variational autoencoders, and recently proposed diffusion models in synthesizing various datasets to replace or augment the original data by an average improvement of 4.30% in Recall@k and 4.65% in NDCG@k.

Multi-Resolution Diffusion for Privacy-Sensitive Recommender Systems

TL;DR

Abstract

Paper Structure (32 sections, 10 equations, 3 figures, 8 tables, 2 algorithms)

This paper contains 32 sections, 10 equations, 3 figures, 8 tables, 2 algorithms.

Introduction
Related Work
Privacy in Recommendations
Synthetic Data Generation
Generative Models in Recommendations
Background
Variational Autoencoders
Diffusion Models
Score-Based Diffusion Recommender Module
SDRM objective
SDRM Training and Sampling
Evaluation
Experiment Setting
Datasets
Baseline Generative Models
...and 17 more sections

Figures (3)

Figure 1: SDRM Training and Sampling
Figure 2: Baseline improvement of SDRM over MultiVAE++ across all datasets for various sizes of top-$k$ recommendation lists. Augmented average improvement: Recall@$k$: 6.81%, NDCG@$k$: 7.73%, Synthetic average improvement: Recall@$k$: 1.79%, NDCG@$k$: 1.56%, Combined average improvement: Recall@$k$: 4.30%, NDCG@$k$: 4.65%
Figure 3: Distribution of the number of items and users for the Amazon Digital Music dataset.

Multi-Resolution Diffusion for Privacy-Sensitive Recommender Systems

TL;DR

Abstract

Multi-Resolution Diffusion for Privacy-Sensitive Recommender Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (3)