NASRec: Weight Sharing Neural Architecture Search for Recommender Systems
Tunhou Zhang, Dehua Cheng, Yuchen He, Zhengxing Chen, Xiaoliang Dai, Liang Xiong, Feng Yan, Hai Li, Yiran Chen, Wei Wen
TL;DR
NASRec introduces a weight-sharing neural architecture search framework tailored for recommender systems by constructing a large, heterogeneous supernet that encompasses diverse operators and dense connectivity to handle multi-modality data. It tackles training inefficiency and ranking misalignment with single-operator any-connection sampling, operator-balancing interaction modules, and post-training fine-tuning, followed by evolutionary search to select the best subnet. Empirically, NASRecNet achieves state-of-the-art CTR performance on three benchmarks (Criteo, Avazu, KDD Cup 2012) with log-loss improvements and AUC gains, while significantly reducing search costs via weight sharing. On Criteo Terabyte, NASRecNet yields an additional small log-loss improvement over baselines, demonstrating scalability to large-scale data, and analyses show notable FLOPs reductions attributed to operator-balancing and efficient path sampling. Overall, NASRec demonstrates a practical path to fully automated architecture fabrication for recommender systems with minimal human priors and substantial performance gains.
Abstract
The rise of deep neural networks offers new opportunities in optimizing recommender systems. However, optimizing recommender systems using deep neural networks requires delicate architecture fabrication. We propose NASRec, a paradigm that trains a single supernet and efficiently produces abundant models/sub-architectures by weight sharing. To overcome the data multi-modality and architecture heterogeneity challenges in the recommendation domain, NASRec establishes a large supernet (i.e., search space) to search the full architectures. The supernet incorporates versatile choice of operators and dense connectivity to minimize human efforts for finding priors. The scale and heterogeneity in NASRec impose several challenges, such as training inefficiency, operator-imbalance, and degraded rank correlation. We tackle these challenges by proposing single-operator any-connection sampling, operator-balancing interaction modules, and post-training fine-tuning. Our crafted models, NASRecNet, show promising results on three Click-Through Rates (CTR) prediction benchmarks, indicating that NASRec outperforms both manually designed models and existing NAS methods with state-of-the-art performance. Our work is publicly available at https://github.com/facebookresearch/NasRec.
