Ads Recommendation in a Collapsed and Entangled World

Junwei Pan; Wei Xue; Ximei Wang; Haibin Yu; Xun Liu; Shijie Quan; Xueming Qiu; Dapeng Liu; Lei Xiao; Jie Jiang

Ads Recommendation in a Collapsed and Entangled World

Junwei Pan, Wei Xue, Ximei Wang, Haibin Yu, Xun Liu, Shijie Quan, Xueming Qiu, Dapeng Liu, Lei Xiao, Jie Jiang

TL;DR

This paper analyzes Tencent's ads recommender through the lens of representation learning, focusing on preserving priors for diverse feature types, mitigating embedding dimensional collapse, and disentangling user interests across tasks and scenarios. It presents a cohesive set of techniques—TIM for sequence encoding, MNSE for numeric features, and Similarity Encoding for pre-trained embeddings—alongside a multi-embedding paradigm (ME), GwPFM, and collapse-resilient interactions to scale capacity. For disentanglement, the authors introduce STEM and AME (and STEM-AL for auxiliary learning), demonstrating consistent online gains across CTR, CVR, and LTV tasks, especially for smaller or low-resource tasks. The work also offers training enhancements (ranking losses, online learning, REW, exploration with uncertainty) and practical analysis tools to measure feature correlations, dimensional collapse, and interest entanglement, illustrating substantial real-world impact in Tencent's vast online advertising platform.

Abstract

We present Tencent's ads recommendation system and examine the challenges and practices of learning appropriate recommendation representations. Our study begins by showcasing our approaches to preserving prior knowledge when encoding features of diverse types into embedding representations. We specifically address sequence features, numeric features, and pre-trained embedding features. Subsequently, we delve into two crucial challenges related to feature representation: the dimensional collapse of embeddings and the interest entanglement across different tasks or scenarios. We propose several practical approaches to address these challenges that result in robust and disentangled recommendation representations. We then explore several training techniques to facilitate model optimization, reduce bias, and enhance exploration. Additionally, we introduce three analysis tools that enable us to study feature correlation, dimensional collapse, and interest entanglement. This work builds upon the continuous efforts of Tencent's ads recommendation team over the past decade. It summarizes general design principles and presents a series of readily applicable solutions and analysis tools. The reported performance is based on our online advertising platform, which handles hundreds of billions of requests daily and serves millions of ads to billions of users.

Ads Recommendation in a Collapsed and Entangled World

TL;DR

Abstract

Paper Structure (37 sections, 10 equations, 4 figures)

This paper contains 37 sections, 10 equations, 4 figures.

Introduction
Brief System Overview
Feature Encoding
Sequence Features
Deployment Details
Numeric Features
Deployment Details
Embedding Features
Deployment Details
Tackling Dimensional Collapse
Embedding Dimensional Collapse
Multi-Embedding Paradigm
Deployment Details
GwPFM: Yet Another Simplified Approach to Multi-Embedding Paradigm
Deployment Details
...and 22 more sections

Figures (4)

Figure 1: Architecture of our Heterogeneous Mixture-of-Experts with Multi-Embedding for single-task learning, which consists of four key modules: feature encoding, multi-embedding lookup, experts (feature interactions and MLPs), and classification towers.
Figure 2: Illustration of Temporal Interest Module (left) for sequence features and Multiple Numeral Systems Encoding (right) for numeric and pre-trained embedding features.
Figure 3: Illustration of interest entanglement between tasks in single-embedding based MTL models and disentanglement in STEM. It shows the distance distribution of the contradictory user-item pair set $S$ (with solid color) as well as the whole user-item pair set (with slash lines) regarding the single task Like (a) and Finish embedding (b), the PLE shared-embedding (c), and the Like (d) and Finish-specific (e) embedding and shared embedding (f) in STEM.
Figure 4: Architecture illustration of various paradigms. Multi-Embedding (ME) is for single-task learning and doesn't disentangle representations. Shared and Task-specific Embedding (STEM) and Asymmetric Multi-Embedding (AME) are both for multi-task learning. STEM disentangles representations via task-specific embeddings, while AME achieves disentanglement through learning multiple embedding tables with different embedding sizes. STEM for Auxiliary Learning (STEM-AL) is for auxiliary learning, which learns task-specific embedding for the main task and a shared embedding updated by multiple tasks.

Theorems & Definitions (1)

Definition 7.1: Information Abundance

Ads Recommendation in a Collapsed and Entangled World

TL;DR

Abstract

Ads Recommendation in a Collapsed and Entangled World

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (1)