Table of Contents
Fetching ...

A Unified Knowledge-Distillation and Semi-Supervised Learning Framework to Improve Industrial Ads Delivery Systems

Hamid Eghbalzadeh, Yang Wang, Rui Li, Yuji Mo, Qin Ding, Jiaxiang Fu, Liang Dai, Shuo Gu, Nima Noorshams, Sem Park, Bo Long, Xue Feng

TL;DR

The paper tackles miscalibration and training-serving data gaps in multi-stage industrial ads ranking by introducing UKDSL, a unified framework that combines cross-stage knowledge distillation, semi-supervised learning from foundation models, and semi-supervised feature selection. It shows theoretically and empirically that ranking pipelines suffer from inherent miscalibration, particularly when early stages select top candidates; UKDSL mitigates this via teacher-student distillation across stages, leveraging unlabeled data and foundation models to improve calibration and performance. Key contributions include a formal analysis of bias in ranked items, a scalable cross-stage distillation approach, a perturbation-based SSFS method, and SSLFM with multi-task learning, all demonstrated in industrial-scale deployments. The work demonstrates substantial practical impact by improving calibration and efficiency across billions of users and diverse surfaces, suggesting unlabeled-data-driven improvements are feasible at industrial scale.

Abstract

Industrial ads ranking systems conventionally rely on labeled impression data, which leads to challenges such as overfitting, slower incremental gain from model scaling, and biases due to discrepancies between training and serving data. To overcome these issues, we propose a Unified framework for Knowledge-Distillation and Semi-supervised Learning (UKDSL) for ads ranking, empowering the training of models on a significantly larger and more diverse datasets, thereby reducing overfitting and mitigating training-serving data discrepancies. We provide detailed formal analysis and numerical simulations on the inherent miscalibration and prediction bias of multi-stage ranking systems, and show empirical evidence of the proposed framework's capability to mitigate those. Compared to prior work, UKDSL can enable models to learn from a much larger set of unlabeled data, hence, improving the performance while being computationally efficient. Finally, we report the successful deployment of UKDSL in an industrial setting across various ranking models, serving users at multi-billion scale, across various surfaces, geological locations, clients, and optimize for various events, which to the best of our knowledge is the first of its kind in terms of the scale and efficiency at which it operates.

A Unified Knowledge-Distillation and Semi-Supervised Learning Framework to Improve Industrial Ads Delivery Systems

TL;DR

The paper tackles miscalibration and training-serving data gaps in multi-stage industrial ads ranking by introducing UKDSL, a unified framework that combines cross-stage knowledge distillation, semi-supervised learning from foundation models, and semi-supervised feature selection. It shows theoretically and empirically that ranking pipelines suffer from inherent miscalibration, particularly when early stages select top candidates; UKDSL mitigates this via teacher-student distillation across stages, leveraging unlabeled data and foundation models to improve calibration and performance. Key contributions include a formal analysis of bias in ranked items, a scalable cross-stage distillation approach, a perturbation-based SSFS method, and SSLFM with multi-task learning, all demonstrated in industrial-scale deployments. The work demonstrates substantial practical impact by improving calibration and efficiency across billions of users and diverse surfaces, suggesting unlabeled-data-driven improvements are feasible at industrial scale.

Abstract

Industrial ads ranking systems conventionally rely on labeled impression data, which leads to challenges such as overfitting, slower incremental gain from model scaling, and biases due to discrepancies between training and serving data. To overcome these issues, we propose a Unified framework for Knowledge-Distillation and Semi-supervised Learning (UKDSL) for ads ranking, empowering the training of models on a significantly larger and more diverse datasets, thereby reducing overfitting and mitigating training-serving data discrepancies. We provide detailed formal analysis and numerical simulations on the inherent miscalibration and prediction bias of multi-stage ranking systems, and show empirical evidence of the proposed framework's capability to mitigate those. Compared to prior work, UKDSL can enable models to learn from a much larger set of unlabeled data, hence, improving the performance while being computationally efficient. Finally, we report the successful deployment of UKDSL in an industrial setting across various ranking models, serving users at multi-billion scale, across various surfaces, geological locations, clients, and optimize for various events, which to the best of our knowledge is the first of its kind in terms of the scale and efficiency at which it operates.

Paper Structure

This paper contains 14 sections, 3 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: a) a block-diagram of a multi-stage ranking system. b) demonstration of unlabeled and labeled data used in industrial systems.
  • Figure 2: Simulation results for model calibrations. a) $\text{cal}(1,2)$. b) $\text{cal}(1,0)$. c) $\text{cal}(2,0)$. d) all previous plots along side each other. Notes: $\mathbf{M_0}$ denotes the ground truth. All calibrations are calculated on the top $k_1$ ads candidates selected by the first stage model (${\mathbb{S}}_1$).
  • Figure 3: UKDSL: Unified Framework for Knowledge Distillation and Semi-Supervised Learning
  • Figure 4: Semi-Supervised Cross-Stage Distillation illustrated with a 3-stage system: each model acts as the teacher for the previous stage on both labeled and unlabeled data
  • Figure 5: This diagram illustrates the Semi-Supervised Feature Selection module used by UKDSL. A set of unbiased features is selected and combined with the regular biased features to create the final set of features for use in the model.
  • ...and 1 more figures