MDL: A Unified Multi-Distribution Learner in Large-scale Industrial Recommendation through Tokenization

Shanlei Mu; Yuchen Jiang; Shikang Wu; Shiyong Hong; Tianmu Sha; Junjie Zhang; Jie Zhu; Zhe Chen; Zhe Wang; Jingjian Lin

MDL: A Unified Multi-Distribution Learner in Large-scale Industrial Recommendation through Tokenization

Shanlei Mu, Yuchen Jiang, Shikang Wu, Shiyong Hong, Tianmu Sha, Junjie Zhang, Jie Zhu, Zhe Chen, Zhe Wang, Jingjian Lin

TL;DR

MDL introduces a tokenization-based framework to unify multi-scenario and multi-task learning in large-scale industrial recommender systems. By transforming features, scenarios, and tasks into a common token space and applying three interacting mechanisms—feature self-interaction, domain-aware attention, and domain-fused fusion—MDL activates the model’s vast parameter space in a bottom-up, layer-wise manner. Offline results on a real Douyin dataset show consistent improvements across scenarios and tasks, while online A/B tests demonstrate gains in LT30 and reductions in change query rate, supporting production deployment. This prompting-inspired approach enables deeper, distribution-aware modeling and scalable performance gains in industrial settings.

Abstract

Industrial recommender systems increasingly adopt multi-scenario learning (MSL) and multi-task learning (MTL) to handle diverse user interactions and contexts, but existing approaches suffer from two critical drawbacks: (1) underutilization of large-scale model parameters due to limited interaction with complex feature modules, and (2) difficulty in jointly modeling scenario and task information in a unified framework. To address these challenges, we propose a unified \textbf{M}ulti-\textbf{D}istribution \textbf{L}earning (MDL) framework, inspired by the "prompting" paradigm in large language models (LLMs). MDL treats scenario and task information as specialized tokens rather than auxiliary inputs or gating signals. Specifically, we introduce a unified information tokenization module that transforms features, scenarios, and tasks into a unified tokenized format. To facilitate deep interaction, we design three synergistic mechanisms: (1) feature token self-attention for rich feature interactions, (2) domain-feature attention for scenario/task-adaptive feature activation, and (3) domain-fused aggregation for joint distribution prediction. By stacking these interactions, MDL enables scenario and task information to "prompt" and activate the model's vast parameter space in a bottom-up, layer-wise manner. Extensive experiments on real-world industrial datasets demonstrate that MDL significantly outperforms state-of-the-art MSL and MTL baselines. Online A/B testing on Douyin Search platform over one month yields +0.0626\% improvement in LT30 and -0.3267\% reduction in change query rate. MDL has been fully deployed in production, serving hundreds of millions of users daily.

MDL: A Unified Multi-Distribution Learner in Large-scale Industrial Recommendation through Tokenization

TL;DR

Abstract

Paper Structure (25 sections, 18 equations, 4 figures, 3 tables)

This paper contains 25 sections, 18 equations, 4 figures, 3 tables.

Introduction
PRELIMINARIES
Methodology
Unified Information Tokenization
Feature Tokenization
Scenario and Task Tokenization
Domain-aware ALL-Token Interaction
Feature Token Self-Interaction
Feature-Scenario/Task Token Interaction
Scenario-Task Token Interaction
MDL Block
EXPERIMENTS
Experimental Setup
Datasets.
Evaluation Settings.
...and 10 more sections

Figures (4)

Figure 1: The overall framework of our approach MDL.
Figure 2: Scaling laws between click QAUC gain on single-column search scenario and model parameters/FLOPs of different models.
Figure 3: Attention distribution between task tokens and feature tokens on different layers. The x-axis represents the index of feature tokens and y-axis represents the average attention weights belong to the corresponding feature token.
Figure 4: Attention distribution between scenario tokens and feature tokens. The x-axis represents the index of feature tokens and y-axis represents the average attention weights belong to the corresponding feature token.

MDL: A Unified Multi-Distribution Learner in Large-scale Industrial Recommendation through Tokenization

TL;DR

Abstract

MDL: A Unified Multi-Distribution Learner in Large-scale Industrial Recommendation through Tokenization

Authors

TL;DR

Abstract

Table of Contents

Figures (4)