Scaling User Modeling: Large-scale Online User Representations for Ads Personalization in Meta
Wei Zhang, Dai Li, Chen Liang, Fang Zhou, Zhongke Zhang, Xuewei Wang, Ru Li, Yi Zhou, Yaning Huang, Dong Liang, Kai Wang, Zhangyuan Wang, Zhengxing Chen, Fenggang Wu, Minghai Chen, Huayu Li, Yunnan Wu, Zhan Shu, Mindi Yuan, Sri Reddy
TL;DR
This paper tackles the challenge of scaling high-quality user representations across hundreds of ads models in a large-scale system. It introduces SUM, an upstream-downstream framework where a few powerful upstream user models generate embeddings that downstream production models consume, supported by the online SOAP platform for latency-free, asynchronous embedding inference. The authors detail a DL RM M-inspired upstream-user tower and a MIX tower, along with multiple interaction modules (MLP, DCN, and MLP-Mixer) and a Mix Tower to maximize training throughput while preserving embedding usefulness. They address embedding distribution shift with averaging strategies and storage optimizations, and demonstrate significant offline and online gains in Meta’s production, including improved metrics with modest capacity increases. The work provides practical deployment lessons and demonstrates that scalable, shared user representations can substantially improve personalization efficiency and effectiveness at scale.
Abstract
Effective user representations are pivotal in personalized advertising. However, stringent constraints on training throughput, serving latency, and memory, often limit the complexity and input feature set of online ads ranking models. This challenge is magnified in extensive systems like Meta's, which encompass hundreds of models with diverse specifications, rendering the tailoring of user representation learning for each model impractical. To address these challenges, we present Scaling User Modeling (SUM), a framework widely deployed in Meta's ads ranking system, designed to facilitate efficient and scalable sharing of online user representation across hundreds of ads models. SUM leverages a few designated upstream user models to synthesize user embeddings from massive amounts of user features with advanced modeling techniques. These embeddings then serve as inputs to downstream online ads ranking models, promoting efficient representation sharing. To adapt to the dynamic nature of user features and ensure embedding freshness, we designed SUM Online Asynchronous Platform (SOAP), a latency free online serving system complemented with model freshness and embedding stabilization, which enables frequent user model updates and online inference of user embeddings upon each user request. We share our hands-on deployment experiences for the SUM framework and validate its superiority through comprehensive experiments. To date, SUM has been launched to hundreds of ads ranking models in Meta, processing hundreds of billions of user requests daily, yielding significant online metric gains and improved infrastructure efficiency.
