Pantheon: Personalized Multi-objective Ensemble Sort via Iterative Pareto Policy Optimization

Jiangxia Cao; Pengbo Xu; Yin Cheng; Kaiwei Guo; Jian Tang; Shijun Wang; Dewei Leng; Shuang Yang; Zhaojie Liu; Yanan Niu; Guorui Zhou; Kun Gai

Pantheon: Personalized Multi-objective Ensemble Sort via Iterative Pareto Policy Optimization

Jiangxia Cao, Pengbo Xu, Yin Cheng, Kaiwei Guo, Jian Tang, Shijun Wang, Dewei Leng, Shuang Yang, Zhaojie Liu, Yanan Niu, Guorui Zhou, Kun Gai

TL;DR

Pantheon tackles the bottleneck of hand-crafted ensemble sorts in industrial RecSys by introducing a neural ensemble that is jointly trained with the ranking model and leverages high-dimensional task representations. It adopts Iterative Pareto Policy Optimization (IPPO) to automatically search for Pareto-optimal weight configurations, enabling balanced performance across multiple objectives. Offline GAUC evaluations and online A/B tests on Kuaishou's live-streaming platform show consistent improvements over the traditional formula-based ensemble sort, with average gains around 1% across metrics and notable exposure-ecology shifts. The approach demonstrates a scalable, personalized ensemble fusion framework with potential to extend to reward-driven generative recommendations in large-scale, real-time systems.

Abstract

In this paper, we provide our milestone ensemble sort work and the first-hand practical experience, Pantheon, which transforms ensemble sorting from a "human-curated art" to a "machine-optimized science". Compared with formulation-based ensemble sort, our Pantheon has the following advantages: (1) Personalized Joint Training: our Pantheon is jointly trained with the real-time ranking model, which could capture ever-changing user personalized interests accurately. (2) Representation inheritance: instead of the highly compressed Pxtrs, our Pantheon utilizes the fine-grained hidden-states as model input, which could benefit from the Ranking model to enhance our model complexity. Meanwhile, to reach a balanced multi-objective ensemble sort, we further devise an \textbf{iterative Pareto policy optimization} (IPPO) strategy to consider the multiple objectives at the same time. To our knowledge, this paper is the first work to replace the entire formulation-based ensemble sort in industry RecSys, which was fully deployed at Kuaishou live-streaming services, serving 400 Million users daily.

Pantheon: Personalized Multi-objective Ensemble Sort via Iterative Pareto Policy Optimization

TL;DR

Abstract

Pantheon: Personalized Multi-objective Ensemble Sort via Iterative Pareto Policy Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (3)