Table of Contents
Fetching ...

DistDNAS: Search Efficient Feature Interactions within 2 Hours

Tunhou Zhang, Wei Wen, Igor Fedorov, Xi Liu, Buyun Zhang, Fangqiu Han, Wen-Yen Chen, Yiping Han, Feng Yan, Hai Li, Yiran Chen

TL;DR

DistDNAS tackles the high cost of designing feature interactions for large-scale CTR by combining a differentiable supernet with a distributed search across multiple data days and a cost-aware regularization objective. This approach accelerates search by aggregating architecture weights learned on each day, achieving about a 25x speed-up and reducing end-to-end search from 2 days to 2 hours, while also pruning redundant interaction modules to improve serving efficiency. A cost-aware regularizer penalizes expensive modules, yielding better FLOPs-NE trade-offs and a cleaner, more deployable interaction design. Evaluated on the 1TB Criteo Terabyte dataset, DistDNAS delivers a small AUC gain (~0.001) and a substantial ~60% FLOPs reduction, pushing the state-of-the-art Pareto frontier for CTR prediction models.

Abstract

Search efficiency and serving efficiency are two major axes in building feature interactions and expediting the model development process in recommender systems. On large-scale benchmarks, searching for the optimal feature interaction design requires extensive cost due to the sequential workflow on the large volume of data. In addition, fusing interactions of various sources, orders, and mathematical operations introduces potential conflicts and additional redundancy toward recommender models, leading to sub-optimal trade-offs in performance and serving cost. In this paper, we present DistDNAS as a neat solution to brew swift and efficient feature interaction design. DistDNAS proposes a supernet to incorporate interaction modules of varying orders and types as a search space. To optimize search efficiency, DistDNAS distributes the search and aggregates the choice of optimal interaction modules on varying data dates, achieving over 25x speed-up and reducing search cost from 2 days to 2 hours. To optimize serving efficiency, DistDNAS introduces a differentiable cost-aware loss to penalize the selection of redundant interaction modules, enhancing the efficiency of discovered feature interactions in serving. We extensively evaluate the best models crafted by DistDNAS on a 1TB Criteo Terabyte dataset. Experimental evaluations demonstrate 0.001 AUC improvement and 60% FLOPs saving over current state-of-the-art CTR models.

DistDNAS: Search Efficient Feature Interactions within 2 Hours

TL;DR

DistDNAS tackles the high cost of designing feature interactions for large-scale CTR by combining a differentiable supernet with a distributed search across multiple data days and a cost-aware regularization objective. This approach accelerates search by aggregating architecture weights learned on each day, achieving about a 25x speed-up and reducing end-to-end search from 2 days to 2 hours, while also pruning redundant interaction modules to improve serving efficiency. A cost-aware regularizer penalizes expensive modules, yielding better FLOPs-NE trade-offs and a cleaner, more deployable interaction design. Evaluated on the 1TB Criteo Terabyte dataset, DistDNAS delivers a small AUC gain (~0.001) and a substantial ~60% FLOPs reduction, pushing the state-of-the-art Pareto frontier for CTR prediction models.

Abstract

Search efficiency and serving efficiency are two major axes in building feature interactions and expediting the model development process in recommender systems. On large-scale benchmarks, searching for the optimal feature interaction design requires extensive cost due to the sequential workflow on the large volume of data. In addition, fusing interactions of various sources, orders, and mathematical operations introduces potential conflicts and additional redundancy toward recommender models, leading to sub-optimal trade-offs in performance and serving cost. In this paper, we present DistDNAS as a neat solution to brew swift and efficient feature interaction design. DistDNAS proposes a supernet to incorporate interaction modules of varying orders and types as a search space. To optimize search efficiency, DistDNAS distributes the search and aggregates the choice of optimal interaction modules on varying data dates, achieving over 25x speed-up and reducing search cost from 2 days to 2 hours. To optimize serving efficiency, DistDNAS introduces a differentiable cost-aware loss to penalize the selection of redundant interaction modules, enhancing the efficiency of discovered feature interactions in serving. We extensively evaluate the best models crafted by DistDNAS on a 1TB Criteo Terabyte dataset. Experimental evaluations demonstrate 0.001 AUC improvement and 60% FLOPs saving over current state-of-the-art CTR models.
Paper Structure (15 sections, 9 equations, 7 figures, 2 tables)

This paper contains 15 sections, 9 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Model AUC versus FLOPs on Criteo Terabyte. DistDNAS unlocks 0.001 AUC compared to state-of-the-art recommender models.
  • Figure 2: Feature interaction search space for each choice block in DistDNAS. Here, a dashed line denotes a searchable feature interaction in DistDNAS, and $\otimes$ denotes the mixing of different feature interaction modules.
  • Figure 3: Overview of DistDNAS methodology. Here, dashed lines denote searchable interaction modules, and the size of interaction modules indicates the cost penalty applied to each interaction module for serving efficiency.
  • Figure 4: QPS comparison between DistDNAS and DNAS.
  • Figure 5: Normalized cost importance in a 7-block supernet.
  • ...and 2 more figures