NASRec: Weight Sharing Neural Architecture Search for Recommender Systems

Tunhou Zhang; Dehua Cheng; Yuchen He; Zhengxing Chen; Xiaoliang Dai; Liang Xiong; Feng Yan; Hai Li; Yiran Chen; Wei Wen

NASRec: Weight Sharing Neural Architecture Search for Recommender Systems

Tunhou Zhang, Dehua Cheng, Yuchen He, Zhengxing Chen, Xiaoliang Dai, Liang Xiong, Feng Yan, Hai Li, Yiran Chen, Wei Wen

TL;DR

NASRec introduces a weight-sharing neural architecture search framework tailored for recommender systems by constructing a large, heterogeneous supernet that encompasses diverse operators and dense connectivity to handle multi-modality data. It tackles training inefficiency and ranking misalignment with single-operator any-connection sampling, operator-balancing interaction modules, and post-training fine-tuning, followed by evolutionary search to select the best subnet. Empirically, NASRecNet achieves state-of-the-art CTR performance on three benchmarks (Criteo, Avazu, KDD Cup 2012) with log-loss improvements and AUC gains, while significantly reducing search costs via weight sharing. On Criteo Terabyte, NASRecNet yields an additional small log-loss improvement over baselines, demonstrating scalability to large-scale data, and analyses show notable FLOPs reductions attributed to operator-balancing and efficient path sampling. Overall, NASRec demonstrates a practical path to fully automated architecture fabrication for recommender systems with minimal human priors and substantial performance gains.

Abstract

The rise of deep neural networks offers new opportunities in optimizing recommender systems. However, optimizing recommender systems using deep neural networks requires delicate architecture fabrication. We propose NASRec, a paradigm that trains a single supernet and efficiently produces abundant models/sub-architectures by weight sharing. To overcome the data multi-modality and architecture heterogeneity challenges in the recommendation domain, NASRec establishes a large supernet (i.e., search space) to search the full architectures. The supernet incorporates versatile choice of operators and dense connectivity to minimize human efforts for finding priors. The scale and heterogeneity in NASRec impose several challenges, such as training inefficiency, operator-imbalance, and degraded rank correlation. We tackle these challenges by proposing single-operator any-connection sampling, operator-balancing interaction modules, and post-training fine-tuning. Our crafted models, NASRecNet, show promising results on three Click-Through Rates (CTR) prediction benchmarks, indicating that NASRec outperforms both manually designed models and existing NAS methods with state-of-the-art performance. Our work is publicly available at https://github.com/facebookresearch/NasRec.

NASRec: Weight Sharing Neural Architecture Search for Recommender Systems

TL;DR

Abstract

Paper Structure (19 sections, 4 equations, 12 figures, 7 tables)

This paper contains 19 sections, 4 equations, 12 figures, 7 tables.

Introduction
Related Work
Hierarchical NASRec Space for Recommender Systems
NASRec Search Space
Search Components
Weight sharing Neural Architecture Search for Recommender Systems
Single-operator Any-Connection Sampling
Operator-Balancing Interaction Modules
Post-training Fine-tuning
Evolutionary Search on Best Models
Experiments
Search Configuration
Recommender System Benchmark Results
Discussion
Conclusion
...and 4 more sections

Figures (12)

Figure 1: Overview of NASRec search space. NASRec search space enables a full architecture search on building operators and dense connectivity. Here, "blue" blocks produce a dense output, and "red" blocks produce a sparse output.
Figure 2: We propose Single-operator Any-connection path sampling by combining the advantages of the first two sampling strategies. Here, dashed connections and operators denotes a sampled path in supernet.
Figure 3: Ranking evaluation of various path sampling strategies on NASRec-Full supernet. We evaluate all ranking coefficients over 100 randomly sampled subnets on Criteo.
Figure 4: Operator-balancing interaction inserts a simple EFC layer before Dot-Product to ensure linear parameter consumption and balance building operators.
Figure 5: Best model discovered on Avazu @ NASRec-Small.
...and 7 more figures

NASRec: Weight Sharing Neural Architecture Search for Recommender Systems

TL;DR

Abstract

NASRec: Weight Sharing Neural Architecture Search for Recommender Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (12)