A Self-boosted Framework for Calibrated Ranking

Shunyu Zhang; Hu Liu; Wentian Bao; Enyun Yu; Yang Song

A Self-boosted Framework for Calibrated Ranking

Shunyu Zhang, Hu Liu, Wentian Bao, Enyun Yu, Yang Song

TL;DR

SBCR tackles calibrated ranking in industrial systems by decoupling ranking from calibration and enabling extensive data shuffling through a self-boosted ranking module. The self-boosted pairwise loss leverages dumped online scores and labels from an older deployed model to provide distribution-aware guidance without aggregating the entire candidate list, while a monotone, piecewise-linear calibration module maps predicted probabilities to calibrated values via a query-dependent function $g(\hat{y};q)$ learned with a Softmax-based interval parameterization. The method is validated on a billion-scale production dataset from Kuaishou, where SBCR achieves superior ranking and calibration metrics and yields meaningful online improvements in CTR, view count, and user engagement with minimal serving overhead. These results demonstrate practical impact for calibrated ranking in real-time systems and point to future work on more flexible monotone calibrators and broader deployment scenarios.

Abstract

Scale-calibrated ranking systems are ubiquitous in real-world applications nowadays, which pursue accurate ranking quality and calibrated probabilistic predictions simultaneously. For instance, in the advertising ranking system, the predicted click-through rate (CTR) is utilized for ranking and required to be calibrated for the downstream cost-per-click ads bidding. Recently, multi-objective based methods have been wildly adopted as a standard approach for Calibrated Ranking, which incorporates the combination of two loss functions: a pointwise loss that focuses on calibrated absolute values and a ranking loss that emphasizes relative orderings. However, when applied to industrial online applications, existing multi-objective CR approaches still suffer from two crucial limitations. First, previous methods need to aggregate the full candidate list within a single mini-batch to compute the ranking loss. Such aggregation strategy violates extensive data shuffling which has long been proven beneficial for preventing overfitting, and thus degrades the training effectiveness. Second, existing multi-objective methods apply the two inherently conflicting loss functions on a single probabilistic prediction, which results in a sub-optimal trade-off between calibration and ranking. To tackle the two limitations, we propose a Self-Boosted framework for Calibrated Ranking (SBCR).

A Self-boosted Framework for Calibrated Ranking

TL;DR

learned with a Softmax-based interval parameterization. The method is validated on a billion-scale production dataset from Kuaishou, where SBCR achieves superior ranking and calibration metrics and yields meaningful online improvements in CTR, view count, and user engagement with minimal serving overhead. These results demonstrate practical impact for calibrated ranking in real-time systems and point to future work on more flexible monotone calibrators and broader deployment scenarios.

Abstract

Paper Structure (24 sections, 1 theorem, 14 equations, 3 figures, 6 tables)

This paper contains 24 sections, 1 theorem, 14 equations, 3 figures, 6 tables.

Introduction
Related Work
Methodology
Preliminaries
Multi-objective Calibrated Ranking
Existing Methods
Limitations of Existing Multi-Objective CR
Self-Boosted Calibrated Ranking
The Self-Boosted Ranking Module
The Calibration Module
The Overall Architecture of SBCR and Training Tricks
Experiments
Experiment Setup
Datasets.
Implementation Details.
...and 9 more sections

Key Result

theorem 1

$\mathcal{L}_{pair}$ and $\mathcal{L}_{point}$ have distinct optimal solutions.

Figures (3)

Figure 1: The performance comparison of different data shuffling strategies. Evaluation metrics include: Logloss, NDCG@10 (widely used in ranking), and the total amount of time users spend on Kuaishou (the most important metric in our online A/B test). Extensive item-level data shuffling (upper) significantly outperforms the query-level data shuffling (bottom) where the whole candidate item list retrieved for a single request is aggregated in a single mini-batch. Theoretical explanation will be discussed in Sec \ref{['sec:limit']}. This experimental result validates the advantage of extensive data shuffling and motivates us to propose a novel ranking loss that enables extensive shuffling.
Figure 2: The architecture of the proposed Self-Boosted framework for Calibrated Ranking. Middle: SBCR consists of two modules: a self-boosted ranking module (SBR) trained by a multi-objective loss (pointwise and self-boosted pairwise loss) and a calibration module. Left: the details of the proposed self-boosted pairwise loss. Using dumped ranking scores $\widetilde{\mathbf s_q}$ from the online deployed model, we enable both comparisons between samples associated with the same query and extensive sample-level data shuffling. Right: the proposed calibration module that decouples the ranking and calibration objectives to avoid the conflict.
Figure 3: The sensitivity of relative ranking weight $(1-\alpha)/\alpha$ for SBCR (Eq. \ref{['eq:multiboost']}). Higher GAUC and lower ECE indicate better performance.

Theorems & Definitions (1)

theorem 1

A Self-boosted Framework for Calibrated Ranking

TL;DR

Abstract

A Self-boosted Framework for Calibrated Ranking

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (1)