Do Not Wait: Learning Re-Ranking Model Without User Feedback At Serving Time in E-Commerce
Yuan Wang, Zhiyu Li, Changshuo Zhang, Sirui Chen, Xiao Zhang, Jun Xu, Quan Lin
TL;DR
The paper tackles the challenge of delayed user feedback in online re-ranking for e-commerce. It introduces Learning At Serving Time (LAST), which uses a surrogate evaluator to provide instructional signals and computes a request-specific parameter delta \Delta \boldsymbol{\theta}^*(u, C) to optimize the deployed model on the fly, yielding hat{L}^*_{LAST} = G(u, C; \boldsymbol{\theta} + \Delta \boldsymbol{\theta}^*(u, C)). LAST supports cascade and parallel variants, the latter employing a gradient-exploration module to propose candidate modifications, all while discarding changes after serving to maintain stability. The approach integrates smoothly with existing online learning systems and demonstrates robust gains in offline metrics (MAP, NDCG) and real-world business metrics (purchases, clicks) in large-scale online experiments. These results suggest LAST provides a practical path to more responsive and context-aware recommendations without awaiting delayed feedback, with code released for reproducibility.
Abstract
Recommender systems have been widely used in e-commerce, and re-ranking models are playing an increasingly significant role in the domain, which leverages the inter-item influence and determines the final recommendation lists. Online learning methods keep updating a deployed model with the latest available samples to capture the shifting of the underlying data distribution in e-commerce. However, they depend on the availability of real user feedback, which may be delayed by hours or even days, such as item purchases, leading to a lag in model enhancement. In this paper, we propose a novel extension of online learning methods for re-ranking modeling, which we term LAST, an acronym for Learning At Serving Time. It circumvents the requirement of user feedback by using a surrogate model to provide the instructional signal needed to steer model improvement. Upon receiving an online request, LAST finds and applies a model modification on the fly before generating a recommendation result for the request. The modification is request-specific and transient. It means the modification is tailored to and only to the current request to capture the specific context of the request. After a request, the modification is discarded, which helps to prevent error propagation and stabilizes the online learning procedure since the predictions of the surrogate model may be inaccurate. Most importantly, as a complement to feedback-based online learning methods, LAST can be seamlessly integrated into existing online learning systems to create a more adaptive and responsive recommendation experience. Comprehensive experiments, both offline and online, affirm that LAST outperforms state-of-the-art re-ranking models.
