Table of Contents
Fetching ...

AdaS&S: a One-Shot Supernet Approach for Automatic Embedding Size Search in Deep Recommender System

He Wei, Yuekui Yang, Yang Zhang, Haiyang Wu, Meixi Liu, Shaoping Ma

TL;DR

This work proposes a novel one-shot AES framework called AdaS&S, in which a supernet encompassing various candidate embeddings is built and AES is performed as searching network architectures within it, and introduces the resource competition penalty to balance the model effectiveness and memory cost of embeddings.

Abstract

Deep Learning Recommendation Model(DLRM)s utilize the embedding layer to represent various categorical features. Traditional DLRMs adopt unified embedding size for all features, leading to suboptimal performance and redundant parameters. Thus, lots of Automatic Embedding size Search (AES) works focus on obtaining mixed embedding sizes with strong model performance. However, previous AES works can hardly address several challenges together: (1) The search results of embedding sizes are unstable; (2) Recommendation effect with AES results is unsatisfactory; (3) Memory cost of embeddings is uncontrollable. To address these challenges, we propose a novel one-shot AES framework called AdaS&S, in which a supernet encompassing various candidate embeddings is built and AES is performed as searching network architectures within it. Our framework contains two main stages: In the first stage, we decouple training parameters from searching embedding sizes, and propose the Adaptive Sampling method to yield a well-trained supernet, which further helps to produce stable AES results. In the second stage, to obtain embedding sizes that benefits the model effect, we design a reinforcement learning search process which utilizes the supernet trained previously. Meanwhile, to adapt searching to specific resource constraint, we introduce the resource competition penalty to balance the model effectiveness and memory cost of embeddings. We conduct extensive experiments on public datasets to show the superiority of AdaS&S. Our method could improve AUC by about 0.3% while saving about 20% of model parameters. Empirical analysis also shows that the stability of searching results in AdaS&S significantly exceeds other methods.

AdaS&S: a One-Shot Supernet Approach for Automatic Embedding Size Search in Deep Recommender System

TL;DR

This work proposes a novel one-shot AES framework called AdaS&S, in which a supernet encompassing various candidate embeddings is built and AES is performed as searching network architectures within it, and introduces the resource competition penalty to balance the model effectiveness and memory cost of embeddings.

Abstract

Deep Learning Recommendation Model(DLRM)s utilize the embedding layer to represent various categorical features. Traditional DLRMs adopt unified embedding size for all features, leading to suboptimal performance and redundant parameters. Thus, lots of Automatic Embedding size Search (AES) works focus on obtaining mixed embedding sizes with strong model performance. However, previous AES works can hardly address several challenges together: (1) The search results of embedding sizes are unstable; (2) Recommendation effect with AES results is unsatisfactory; (3) Memory cost of embeddings is uncontrollable. To address these challenges, we propose a novel one-shot AES framework called AdaS&S, in which a supernet encompassing various candidate embeddings is built and AES is performed as searching network architectures within it. Our framework contains two main stages: In the first stage, we decouple training parameters from searching embedding sizes, and propose the Adaptive Sampling method to yield a well-trained supernet, which further helps to produce stable AES results. In the second stage, to obtain embedding sizes that benefits the model effect, we design a reinforcement learning search process which utilizes the supernet trained previously. Meanwhile, to adapt searching to specific resource constraint, we introduce the resource competition penalty to balance the model effectiveness and memory cost of embeddings. We conduct extensive experiments on public datasets to show the superiority of AdaS&S. Our method could improve AUC by about 0.3% while saving about 20% of model parameters. Empirical analysis also shows that the stability of searching results in AdaS&S significantly exceeds other methods.

Paper Structure

This paper contains 20 sections, 8 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Base architecture of the DLRMs.
  • Figure 2: Overview of the proposed AdaS&S framework. The left part is the stage of training the Adaptive Supernet. The right part is the stage of RL-Search process.
  • Figure 3: Two schemes of supernet training.
  • Figure 4: Stability of AES. Each row represents the number of times a feature is assigned to various Emb-sizes in 100 searches.
  • Figure 5: Effect of resources competition penalty. P-R indicates the percentage of Parameters Reduce. Left part is the robustness test of $\lambda_{r}$ where $\lambda_{c}$ is set as 0.08. Right part is the robustness test of $\lambda_{c}$ where $\lambda_{r}$ is set as 0.005.
  • ...and 1 more figures