Table of Contents
Fetching ...

MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest

Xiao Yang, Peifeng Yin, Abe Engle, Jinfeng Zhuang, Ling Leng

TL;DR

MTMD tackles the fragmentation of data and goals in Pinterest's lightweight ad ranking by introducing a two-tower, mixture-of-experts architecture that jointly handles multiple ad domains and prediction tasks. The Domain Expert blends task-specific, domain-shared, and task-shared knowledge with a domain adaptation module and constrained modeling to align lightweight predictions with heavyweight signals. Offline results show 12–36% improvements in the predictive loss (LogMAE) and online A/B tests demonstrate CPC reductions and CTR gains, enabling deployment that replaced nine production models. This approach reduces maintenance complexity while delivering scalable, cross-domain gains for both engagement and conversion-oriented ads across multiple surfaces and products.

Abstract

The lightweight ad ranking layer, living after the retrieval stage and before the fine ranker, plays a critical role in the success of a cascaded ad recommendation system. Due to the fact that there are multiple optimization tasks depending on the ad domain, e.g., Click Through Rate (CTR) for click ads and Conversion Rate (CVR) for conversion ads, as well as multiple surfaces where an ad is served (home feed, search, or related item recommendation) with diverse ad products (shopping or standard ad); it is an essentially challenging problem in industry on how to do joint holistic optimization in the lightweight ranker, such that the overall platform's value, advertiser's value, and user's value are maximized. Deep Neural Network (DNN)-based multitask learning (MTL) can handle multiple goals naturally, with each prediction head mapping to a particular optimization goal. However, in practice, it is unclear how to unify data from different surfaces and ad products into a single model. It is critical to learn domain-specialized knowledge and explicitly transfer knowledge between domains to make MTL effective. We present a Multi-Task Multi-Domain (MTMD) architecture under the classic Two-Tower paradigm, with the following key contributions: 1) handle different prediction tasks, ad products, and ad serving surfaces in a unified framework; 2) propose a novel mixture-of-expert architecture to learn both specialized knowledge each domain and common knowledge shared between domains; 3) propose a domain adaption module to encourage knowledge transfer between experts; 4) constrain the modeling of different prediction tasks. MTMD improves the offline loss value by 12% to 36%, mapping to 2% online reduction in cost per click. We have deployed this single MTMD framework into production for Pinterest ad recommendation replacing 9 production models.

MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest

TL;DR

MTMD tackles the fragmentation of data and goals in Pinterest's lightweight ad ranking by introducing a two-tower, mixture-of-experts architecture that jointly handles multiple ad domains and prediction tasks. The Domain Expert blends task-specific, domain-shared, and task-shared knowledge with a domain adaptation module and constrained modeling to align lightweight predictions with heavyweight signals. Offline results show 12–36% improvements in the predictive loss (LogMAE) and online A/B tests demonstrate CPC reductions and CTR gains, enabling deployment that replaced nine production models. This approach reduces maintenance complexity while delivering scalable, cross-domain gains for both engagement and conversion-oriented ads across multiple surfaces and products.

Abstract

The lightweight ad ranking layer, living after the retrieval stage and before the fine ranker, plays a critical role in the success of a cascaded ad recommendation system. Due to the fact that there are multiple optimization tasks depending on the ad domain, e.g., Click Through Rate (CTR) for click ads and Conversion Rate (CVR) for conversion ads, as well as multiple surfaces where an ad is served (home feed, search, or related item recommendation) with diverse ad products (shopping or standard ad); it is an essentially challenging problem in industry on how to do joint holistic optimization in the lightweight ranker, such that the overall platform's value, advertiser's value, and user's value are maximized. Deep Neural Network (DNN)-based multitask learning (MTL) can handle multiple goals naturally, with each prediction head mapping to a particular optimization goal. However, in practice, it is unclear how to unify data from different surfaces and ad products into a single model. It is critical to learn domain-specialized knowledge and explicitly transfer knowledge between domains to make MTL effective. We present a Multi-Task Multi-Domain (MTMD) architecture under the classic Two-Tower paradigm, with the following key contributions: 1) handle different prediction tasks, ad products, and ad serving surfaces in a unified framework; 2) propose a novel mixture-of-expert architecture to learn both specialized knowledge each domain and common knowledge shared between domains; 3) propose a domain adaption module to encourage knowledge transfer between experts; 4) constrain the modeling of different prediction tasks. MTMD improves the offline loss value by 12% to 36%, mapping to 2% online reduction in cost per click. We have deployed this single MTMD framework into production for Pinterest ad recommendation replacing 9 production models.

Paper Structure

This paper contains 17 sections, 1 equation, 4 figures, 6 tables.

Figures (4)

  • Figure 1: The throughput of each stage in a typical ad delivery funnel. Two-tower based model has the advantage of fast inference when the throughput is high.
  • Figure 2: The design of the ad Domain Adaptation module in MTMD based on the Squeeze-and-Excitation block.
  • Figure 3: Design of Domain Expert. On the Left: shallow expert per task. On the Right: After passing Domain Adaptation processing of features, it goes through task specific Deep Expert, Task-Shared Expert, and Domain-Shared Expert.
  • Figure 4: The two tower arch based on Domain Expert.