Table of Contents
Fetching ...

Scaling User Modeling: Large-scale Online User Representations for Ads Personalization in Meta

Wei Zhang, Dai Li, Chen Liang, Fang Zhou, Zhongke Zhang, Xuewei Wang, Ru Li, Yi Zhou, Yaning Huang, Dong Liang, Kai Wang, Zhangyuan Wang, Zhengxing Chen, Fenggang Wu, Minghai Chen, Huayu Li, Yunnan Wu, Zhan Shu, Mindi Yuan, Sri Reddy

TL;DR

This paper tackles the challenge of scaling high-quality user representations across hundreds of ads models in a large-scale system. It introduces SUM, an upstream-downstream framework where a few powerful upstream user models generate embeddings that downstream production models consume, supported by the online SOAP platform for latency-free, asynchronous embedding inference. The authors detail a DL RM M-inspired upstream-user tower and a MIX tower, along with multiple interaction modules (MLP, DCN, and MLP-Mixer) and a Mix Tower to maximize training throughput while preserving embedding usefulness. They address embedding distribution shift with averaging strategies and storage optimizations, and demonstrate significant offline and online gains in Meta’s production, including improved metrics with modest capacity increases. The work provides practical deployment lessons and demonstrates that scalable, shared user representations can substantially improve personalization efficiency and effectiveness at scale.

Abstract

Effective user representations are pivotal in personalized advertising. However, stringent constraints on training throughput, serving latency, and memory, often limit the complexity and input feature set of online ads ranking models. This challenge is magnified in extensive systems like Meta's, which encompass hundreds of models with diverse specifications, rendering the tailoring of user representation learning for each model impractical. To address these challenges, we present Scaling User Modeling (SUM), a framework widely deployed in Meta's ads ranking system, designed to facilitate efficient and scalable sharing of online user representation across hundreds of ads models. SUM leverages a few designated upstream user models to synthesize user embeddings from massive amounts of user features with advanced modeling techniques. These embeddings then serve as inputs to downstream online ads ranking models, promoting efficient representation sharing. To adapt to the dynamic nature of user features and ensure embedding freshness, we designed SUM Online Asynchronous Platform (SOAP), a latency free online serving system complemented with model freshness and embedding stabilization, which enables frequent user model updates and online inference of user embeddings upon each user request. We share our hands-on deployment experiences for the SUM framework and validate its superiority through comprehensive experiments. To date, SUM has been launched to hundreds of ads ranking models in Meta, processing hundreds of billions of user requests daily, yielding significant online metric gains and improved infrastructure efficiency.

Scaling User Modeling: Large-scale Online User Representations for Ads Personalization in Meta

TL;DR

This paper tackles the challenge of scaling high-quality user representations across hundreds of ads models in a large-scale system. It introduces SUM, an upstream-downstream framework where a few powerful upstream user models generate embeddings that downstream production models consume, supported by the online SOAP platform for latency-free, asynchronous embedding inference. The authors detail a DL RM M-inspired upstream-user tower and a MIX tower, along with multiple interaction modules (MLP, DCN, and MLP-Mixer) and a Mix Tower to maximize training throughput while preserving embedding usefulness. They address embedding distribution shift with averaging strategies and storage optimizations, and demonstrate significant offline and online gains in Meta’s production, including improved metrics with modest capacity increases. The work provides practical deployment lessons and demonstrates that scalable, shared user representations can substantially improve personalization efficiency and effectiveness at scale.

Abstract

Effective user representations are pivotal in personalized advertising. However, stringent constraints on training throughput, serving latency, and memory, often limit the complexity and input feature set of online ads ranking models. This challenge is magnified in extensive systems like Meta's, which encompass hundreds of models with diverse specifications, rendering the tailoring of user representation learning for each model impractical. To address these challenges, we present Scaling User Modeling (SUM), a framework widely deployed in Meta's ads ranking system, designed to facilitate efficient and scalable sharing of online user representation across hundreds of ads models. SUM leverages a few designated upstream user models to synthesize user embeddings from massive amounts of user features with advanced modeling techniques. These embeddings then serve as inputs to downstream online ads ranking models, promoting efficient representation sharing. To adapt to the dynamic nature of user features and ensure embedding freshness, we designed SUM Online Asynchronous Platform (SOAP), a latency free online serving system complemented with model freshness and embedding stabilization, which enables frequent user model updates and online inference of user embeddings upon each user request. We share our hands-on deployment experiences for the SUM framework and validate its superiority through comprehensive experiments. To date, SUM has been launched to hundreds of ads ranking models in Meta, processing hundreds of billions of user requests daily, yielding significant online metric gains and improved infrastructure efficiency.
Paper Structure (28 sections, 9 equations, 3 figures, 4 tables)

This paper contains 28 sections, 9 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: An overview of the proposed SUM framework. SUM envisions the following state: We have a few dedicated user models that can consume a vast amount of user-side features with advanced user modeling techniques and produce embedding representation for each user. User models can be trained with multiple supervisions (click, conversion, etc.) and support recurring snapshot updates. Multiple downstream models are able to safely consume user model output (i.e., SUM user embeddings) as input features. As a result, the gain from the user model will add up across all the downstream models.
  • Figure 2: An illustration of SUM upstream model architecture. The SUM user tower consumes the massive amount of user features and outputs a few user embeddings which will then be fed to mix tower. The user tower is the core of the upstream model and has a pyramid architecture with residual connections to learn user representations gradually. Its basic building block, Interaction Module, consists of various feature extractors in parallel to capture different feature interactions.
  • Figure 3: An illustration of SOAP, the online serving system for SUM, which leverages our proposed Async Serving paradigm.