Table of Contents
Fetching ...

Dynamic Sparse Learning: A Novel Paradigm for Efficient Recommendation

Shuyao Wang, Yongduo Sui, Jiancan Wu, Zhi Zheng, Hui Xiong

TL;DR

The paper addresses the high training and inference costs of large-scale recommender systems by proposing Dynamic Sparse Learning (DSL), which trains a lightweight sparse model from random initialization and periodically updates the weight importance and sparsity distribution to maintain a fixed parameter budget. DSL interleaves sparsity initialization, sparse learning, and dynamic exploration (pruning and growth) with a cosine-annealed update schedule, enabling end-to-end efficiency from training to inference. Empirically, DSL reduces training MACs, inference MACs, and memory across diverse backbone models and six benchmark datasets while maintaining or improving recommendation performance, outperforming many KD, AutoML, and pruning baselines in cost-performance trade-offs. The work also provides analyses of sparsity distributions, convergence behavior, and hyperparameter effects to justify the method and illuminate practical deployment considerations, demonstrating its practical impact for scalable recommendation systems.

Abstract

In the realm of deep learning-based recommendation systems, the increasing computational demands, driven by the growing number of users and items, pose a significant challenge to practical deployment. This challenge is primarily twofold: reducing the model size while effectively learning user and item representations for efficient recommendations. Despite considerable advancements in model compression and architecture search, prevalent approaches face notable constraints. These include substantial additional computational costs from pre-training/re-training in model compression and an extensive search space in architecture design. Additionally, managing complexity and adhering to memory constraints is problematic, especially in scenarios with strict time or space limitations. Addressing these issues, this paper introduces a novel learning paradigm, Dynamic Sparse Learning (DSL), tailored for recommendation models. DSL innovatively trains a lightweight sparse model from scratch, periodically evaluating and dynamically adjusting each weight's significance and the model's sparsity distribution during the training. This approach ensures a consistent and minimal parameter budget throughout the full learning lifecycle, paving the way for "end-to-end" efficiency from training to inference. Our extensive experimental results underline DSL's effectiveness, significantly reducing training and inference costs while delivering comparable recommendation performance.

Dynamic Sparse Learning: A Novel Paradigm for Efficient Recommendation

TL;DR

The paper addresses the high training and inference costs of large-scale recommender systems by proposing Dynamic Sparse Learning (DSL), which trains a lightweight sparse model from random initialization and periodically updates the weight importance and sparsity distribution to maintain a fixed parameter budget. DSL interleaves sparsity initialization, sparse learning, and dynamic exploration (pruning and growth) with a cosine-annealed update schedule, enabling end-to-end efficiency from training to inference. Empirically, DSL reduces training MACs, inference MACs, and memory across diverse backbone models and six benchmark datasets while maintaining or improving recommendation performance, outperforming many KD, AutoML, and pruning baselines in cost-performance trade-offs. The work also provides analyses of sparsity distributions, convergence behavior, and hyperparameter effects to justify the method and illuminate practical deployment considerations, demonstrating its practical impact for scalable recommendation systems.

Abstract

In the realm of deep learning-based recommendation systems, the increasing computational demands, driven by the growing number of users and items, pose a significant challenge to practical deployment. This challenge is primarily twofold: reducing the model size while effectively learning user and item representations for efficient recommendations. Despite considerable advancements in model compression and architecture search, prevalent approaches face notable constraints. These include substantial additional computational costs from pre-training/re-training in model compression and an extensive search space in architecture design. Additionally, managing complexity and adhering to memory constraints is problematic, especially in scenarios with strict time or space limitations. Addressing these issues, this paper introduces a novel learning paradigm, Dynamic Sparse Learning (DSL), tailored for recommendation models. DSL innovatively trains a lightweight sparse model from scratch, periodically evaluating and dynamically adjusting each weight's significance and the model's sparsity distribution during the training. This approach ensures a consistent and minimal parameter budget throughout the full learning lifecycle, paving the way for "end-to-end" efficiency from training to inference. Our extensive experimental results underline DSL's effectiveness, significantly reducing training and inference costs while delivering comparable recommendation performance.
Paper Structure (31 sections, 1 equation, 5 figures, 4 tables, 1 algorithm)

This paper contains 31 sections, 1 equation, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: The overview of the proposed Dynamic Sparse Learning (DSL) framework.
  • Figure 2: Performance comparisons with different sparsity levels. The star denotes the extreme sparsity of DSL, which achieves the similar performance levels as the baseline.
  • Figure 3: Performance over three sparsity levels. (Left): Performance over different update intervals $\Delta T$; (Right): Performance over different initial update ratios $\rho_0$.
  • Figure 4: (Left): Sparsity distribution of the embeddings. Larger GroupIDs indicate more popular users or items. (Right): Embedding visualization with different popularity-levels. Darker color denotes larger weight magnitude and white color denotes the pruned weights.
  • Figure 5: Convergence comparisons of training loss for the baseline model, RP, and DSL on different initial update ratios $\rho_0$ and update intervals $\Delta T$.

Theorems & Definitions (1)

  • Definition 1: Winning Ticket