Table of Contents
Fetching ...

ML-DCN: Masked Low-Rank Deep Crossing Network Towards Scalable Ads Click-through Rate Prediction at Pinterest

Jiacheng Li, Yixiong Meng, Yi wu, Yun Zhao, Sharare Zehtabian, Jiayin Jin, Degao Peng, Jinfeng Zhuang, Qifei Shen, Kungang Li

TL;DR

This work tackles CTR prediction for large-scale Pinterest ads under fixed serving budgets, where naive scaling of interaction modules saturates offline gains. It introduces ML-DCN, a masked low-rank deep crossing network that uses an instance-guided mask inside a low-rank crossing space to amplify salient interactions on a per-example basis. Offline experiments show ML-DCN achieves higher AUC at matched FLOPs and a more favorable AUC–FLOPs trade-off than DCNv2, MaskNet, WuKong, and RankMixer. Online A/B tests report statistically significant CTR improvements (e.g., +1.89% platform CTR) with neutral serving cost, and the model has been deployed in production. The results demonstrate that per-example masking within a low-rank interaction space enables higher-order interactions with practical compute budgets.

Abstract

Deep learning recommendation systems rely on feature interaction modules to model complex user-item relationships across sparse categorical and dense features. In large-scale ad ranking, increasing model capacity is a promising path to improving both predictive performance and business outcomes, yet production serving budgets impose strict constraints on latency and FLOPs. This creates a central tension: we want interaction modules that both scale effectively with additional compute and remain compute-efficient at serving time. In this work, we study how to scale feature interaction modules under a fixed serving budget. We find that naively scaling DCNv2 and MaskNet, despite their widespread adoption in industry, yields rapidly diminishing offline gains in the Pinterest ads ranking system. To overcome aforementioned limitations, we propose ML-DCN, an interaction module that integrates an instance-conditioned mask into a low-rank crossing layer, enabling per-example selection and amplification of salient interaction directions while maintaining efficient computation. This novel architecture combines the strengths of DCNv2 and MaskNet, scales efficiently with increased compute, and achieves state-of-the-art performance. Experiments on a large internal Pinterest ads dataset show that ML-DCN achieves higher AUC than DCNv2, MaskNet, and recent scaling-oriented alternatives at matched FLOPs, and it scales more favorably overall as compute increases, exhibiting a stronger AUC-FLOPs trade-off. Finally, online A/B tests demonstrate statistically significant improvements in key ads metrics (including CTR and click-quality measures) and ML-DCN has been deployed in the production system with neutral serving cost.

ML-DCN: Masked Low-Rank Deep Crossing Network Towards Scalable Ads Click-through Rate Prediction at Pinterest

TL;DR

This work tackles CTR prediction for large-scale Pinterest ads under fixed serving budgets, where naive scaling of interaction modules saturates offline gains. It introduces ML-DCN, a masked low-rank deep crossing network that uses an instance-guided mask inside a low-rank crossing space to amplify salient interactions on a per-example basis. Offline experiments show ML-DCN achieves higher AUC at matched FLOPs and a more favorable AUC–FLOPs trade-off than DCNv2, MaskNet, WuKong, and RankMixer. Online A/B tests report statistically significant CTR improvements (e.g., +1.89% platform CTR) with neutral serving cost, and the model has been deployed in production. The results demonstrate that per-example masking within a low-rank interaction space enables higher-order interactions with practical compute budgets.

Abstract

Deep learning recommendation systems rely on feature interaction modules to model complex user-item relationships across sparse categorical and dense features. In large-scale ad ranking, increasing model capacity is a promising path to improving both predictive performance and business outcomes, yet production serving budgets impose strict constraints on latency and FLOPs. This creates a central tension: we want interaction modules that both scale effectively with additional compute and remain compute-efficient at serving time. In this work, we study how to scale feature interaction modules under a fixed serving budget. We find that naively scaling DCNv2 and MaskNet, despite their widespread adoption in industry, yields rapidly diminishing offline gains in the Pinterest ads ranking system. To overcome aforementioned limitations, we propose ML-DCN, an interaction module that integrates an instance-conditioned mask into a low-rank crossing layer, enabling per-example selection and amplification of salient interaction directions while maintaining efficient computation. This novel architecture combines the strengths of DCNv2 and MaskNet, scales efficiently with increased compute, and achieves state-of-the-art performance. Experiments on a large internal Pinterest ads dataset show that ML-DCN achieves higher AUC than DCNv2, MaskNet, and recent scaling-oriented alternatives at matched FLOPs, and it scales more favorably overall as compute increases, exhibiting a stronger AUC-FLOPs trade-off. Finally, online A/B tests demonstrate statistically significant improvements in key ads metrics (including CTR and click-quality measures) and ML-DCN has been deployed in the production system with neutral serving cost.
Paper Structure (14 sections, 10 equations, 2 figures, 6 tables)

This paper contains 14 sections, 10 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Architecture of a ML-DCN block. Given input $X_{l-1}$, the block projects inputs into a low-rank interaction space, applies an instance-guided mask computed from $X_{l-1}$, maps the masked interactions back to the original input space and crosses with $X_0$, then adds a residual connection and LayerNorm to produce $X_l$.
  • Figure 2: The AUC gain versus FLOPs curve.