Table of Contents
Fetching ...

DIET: Learning to Distill Dataset Continually for Recommender Systems

Jiaqing Zhang, Hao Wang, Mingjia Yin, Bo Chen, Qinglin Jia, Rui Zhou, Ruiming Tang, ChaoYi Ma, Enhong Chen

Abstract

Modern deep recommender models are trained under a continual learning paradigm, relying on massive and continuously growing streaming behavioral logs. In large-scale platforms, retraining models on full historical data for architecture comparison or iteration is prohibitively expensive, severely slowing down model development. This challenge calls for data-efficient approaches that can faithfully approximate full-data training behavior without repeatedly processing the entire evolving data stream. We formulate this problem as \emph{streaming dataset distillation for recommender systems} and propose \textbf{DIET}, a unified framework that maintains a compact distilled dataset which evolves alongside streaming data while preserving training-critical signals. Unlike existing dataset distillation methods that construct a static distilled set, DIET models distilled data as an evolving training memory and updates it in a stage-wise manner to remain aligned with long-term training dynamics. DIET enables effective continual distillation through principled initialization from influential samples and selective updates guided by influence-aware memory addressing within a bi-level optimization framework. Experiments on large-scale recommendation benchmarks demonstrate that DIET compresses training data to as little as \textbf{1-2\%} of the original size while preserving performance trends consistent with full-data training, reducing model iteration cost by up to \textbf{60$\times$}. Moreover, the distilled datasets produced by DIET generalize well across different model architectures, highlighting streaming dataset distillation as a scalable and reusable data foundation for recommender system development.

DIET: Learning to Distill Dataset Continually for Recommender Systems

Abstract

Modern deep recommender models are trained under a continual learning paradigm, relying on massive and continuously growing streaming behavioral logs. In large-scale platforms, retraining models on full historical data for architecture comparison or iteration is prohibitively expensive, severely slowing down model development. This challenge calls for data-efficient approaches that can faithfully approximate full-data training behavior without repeatedly processing the entire evolving data stream. We formulate this problem as \emph{streaming dataset distillation for recommender systems} and propose \textbf{DIET}, a unified framework that maintains a compact distilled dataset which evolves alongside streaming data while preserving training-critical signals. Unlike existing dataset distillation methods that construct a static distilled set, DIET models distilled data as an evolving training memory and updates it in a stage-wise manner to remain aligned with long-term training dynamics. DIET enables effective continual distillation through principled initialization from influential samples and selective updates guided by influence-aware memory addressing within a bi-level optimization framework. Experiments on large-scale recommendation benchmarks demonstrate that DIET compresses training data to as little as \textbf{1-2\%} of the original size while preserving performance trends consistent with full-data training, reducing model iteration cost by up to \textbf{60}. Moreover, the distilled datasets produced by DIET generalize well across different model architectures, highlighting streaming dataset distillation as a scalable and reusable data foundation for recommender system development.

Paper Structure

This paper contains 48 sections, 22 equations, 3 figures, 5 tables, 2 algorithms.

Figures (3)

  • Figure 1: Model performance consistency under data reduction. Correlation between model performance measured on reduced data and on full data. Each point corresponds to a model–dataset pair, with the dashed diagonal indicating ideal fidelity to full-data training. DIET exhibits substantially higher correlation with full-data performance than sampling-based baselines, demonstrating superior preservation of comparative model behavior.
  • Figure 2: DIET operates in two phases under a continual learning paradigm. Phase 1 (left) constructs a boundary memory by selecting task-conditioned influential samples $\mathcal{S}_t$ from each data block $\mathcal{D}_t$ using reference model checkpoints $\phi_t$ with label-conditioned EL2N scoring. The selected samples are converted into embedding--soft label pairs and fused with aligned historical synthetic memory $\mathcal{D}^{syn}_{1:t-1}$ via an alignment estimation module $\mathcal{A}$, yielding the updated synthetic dataset $\mathcal{D}^{syn}_t$. Phase 2 (right) refines the synthetic memory through influence-guided addressing, which selects hard real targets $\mathcal{B}_t^{hard}$ and active synthetic units $\mathcal{M}_t^{active}$. The active memory drives inner-loop training of a proxy model initialized from the reference model, while the synthetic data are updated in the outer loop using a meta-objective on $\mathcal{B}_t^{hard}$, accelerated by RaT-BPTT.
  • Figure 3: Performance comparison across compression ratios with WuKong as the candidate model. Subplots (a) and (b) show results on the KuaiRand and Tmall datasets. The dotted lines represent the full-data upper bound.