Multi-objective Learning to Rank by Model Distillation
Jie Tang, Huiji Gao, Liwei He, Sanjeev Katariya
TL;DR
This work presents MO-LTR-MD, a distillation-based approach to multi-objective learning to rank that reformulates MO-LTR as a model distillation problem. By training objective-specific teachers and a student that learns from both hard ground-truth labels and soft-labels derived from teacher models, the method mitigates data imbalance, eliminates the need for online score aggregation weights, and accommodates ad-hoc non-differentiable objectives. Empirical results on Airbnb data show improved primary objective (CVR) and balanced secondary objectives, with additional gains in NDCG and reduced model irreproducibility via self-distillation. The framework also demonstrates how soft-labels can encode ad-hoc goals and transfer knowledge across model versions, offering practical benefits for production ranking systems.
Abstract
In online marketplaces, search ranking's objective is not only to purchase or conversion (primary objective), but to also the purchase outcomes(secondary objectives), e.g. order cancellation(or return), review rating, customer service inquiries, platform long term growth. Multi-objective learning to rank has been widely studied to balance primary and secondary objectives. But traditional approaches in industry face some challenges including expensive parameter tuning leads to sub-optimal solution, suffering from imbalanced data sparsity issue, and being not compatible with ad-hoc objective. In this paper, we propose a distillation-based ranking solution for multi-objective ranking, which optimizes the end-to-end ranking system at Airbnb across multiple ranking models on different objectives along with various considerations to optimize training and serving efficiency to meet industry standards. We found it performs much better than traditional approaches, it doesn't only significantly increases primary objective by a large margin but also meet secondary objectives constraints and improve model stability. We also demonstrated the proposed system could be further simplified by model self-distillation. Besides this, we did additional simulations to show that this approach could also help us efficiently inject ad-hoc non-differentiable business objective into the ranking system while enabling us to balance our optimization objectives.
