A Unified Knowledge-Distillation and Semi-Supervised Learning Framework to Improve Industrial Ads Delivery Systems
Hamid Eghbalzadeh, Yang Wang, Rui Li, Yuji Mo, Qin Ding, Jiaxiang Fu, Liang Dai, Shuo Gu, Nima Noorshams, Sem Park, Bo Long, Xue Feng
TL;DR
The paper tackles miscalibration and training-serving data gaps in multi-stage industrial ads ranking by introducing UKDSL, a unified framework that combines cross-stage knowledge distillation, semi-supervised learning from foundation models, and semi-supervised feature selection. It shows theoretically and empirically that ranking pipelines suffer from inherent miscalibration, particularly when early stages select top candidates; UKDSL mitigates this via teacher-student distillation across stages, leveraging unlabeled data and foundation models to improve calibration and performance. Key contributions include a formal analysis of bias in ranked items, a scalable cross-stage distillation approach, a perturbation-based SSFS method, and SSLFM with multi-task learning, all demonstrated in industrial-scale deployments. The work demonstrates substantial practical impact by improving calibration and efficiency across billions of users and diverse surfaces, suggesting unlabeled-data-driven improvements are feasible at industrial scale.
Abstract
Industrial ads ranking systems conventionally rely on labeled impression data, which leads to challenges such as overfitting, slower incremental gain from model scaling, and biases due to discrepancies between training and serving data. To overcome these issues, we propose a Unified framework for Knowledge-Distillation and Semi-supervised Learning (UKDSL) for ads ranking, empowering the training of models on a significantly larger and more diverse datasets, thereby reducing overfitting and mitigating training-serving data discrepancies. We provide detailed formal analysis and numerical simulations on the inherent miscalibration and prediction bias of multi-stage ranking systems, and show empirical evidence of the proposed framework's capability to mitigate those. Compared to prior work, UKDSL can enable models to learn from a much larger set of unlabeled data, hence, improving the performance while being computationally efficient. Finally, we report the successful deployment of UKDSL in an industrial setting across various ranking models, serving users at multi-billion scale, across various surfaces, geological locations, clients, and optimize for various events, which to the best of our knowledge is the first of its kind in terms of the scale and efficiency at which it operates.
