Do Not Wait: Learning Re-Ranking Model Without User Feedback At Serving Time in E-Commerce

Yuan Wang; Zhiyu Li; Changshuo Zhang; Sirui Chen; Xiao Zhang; Jun Xu; Quan Lin

Do Not Wait: Learning Re-Ranking Model Without User Feedback At Serving Time in E-Commerce

Yuan Wang, Zhiyu Li, Changshuo Zhang, Sirui Chen, Xiao Zhang, Jun Xu, Quan Lin

TL;DR

The paper tackles the challenge of delayed user feedback in online re-ranking for e-commerce. It introduces Learning At Serving Time (LAST), which uses a surrogate evaluator to provide instructional signals and computes a request-specific parameter delta \Delta \boldsymbol{\theta}^*(u, C) to optimize the deployed model on the fly, yielding hat{L}^*_{LAST} = G(u, C; \boldsymbol{\theta} + \Delta \boldsymbol{\theta}^*(u, C)). LAST supports cascade and parallel variants, the latter employing a gradient-exploration module to propose candidate modifications, all while discarding changes after serving to maintain stability. The approach integrates smoothly with existing online learning systems and demonstrates robust gains in offline metrics (MAP, NDCG) and real-world business metrics (purchases, clicks) in large-scale online experiments. These results suggest LAST provides a practical path to more responsive and context-aware recommendations without awaiting delayed feedback, with code released for reproducibility.

Abstract

Recommender systems have been widely used in e-commerce, and re-ranking models are playing an increasingly significant role in the domain, which leverages the inter-item influence and determines the final recommendation lists. Online learning methods keep updating a deployed model with the latest available samples to capture the shifting of the underlying data distribution in e-commerce. However, they depend on the availability of real user feedback, which may be delayed by hours or even days, such as item purchases, leading to a lag in model enhancement. In this paper, we propose a novel extension of online learning methods for re-ranking modeling, which we term LAST, an acronym for Learning At Serving Time. It circumvents the requirement of user feedback by using a surrogate model to provide the instructional signal needed to steer model improvement. Upon receiving an online request, LAST finds and applies a model modification on the fly before generating a recommendation result for the request. The modification is request-specific and transient. It means the modification is tailored to and only to the current request to capture the specific context of the request. After a request, the modification is discarded, which helps to prevent error propagation and stabilizes the online learning procedure since the predictions of the surrogate model may be inaccurate. Most importantly, as a complement to feedback-based online learning methods, LAST can be seamlessly integrated into existing online learning systems to create a more adaptive and responsive recommendation experience. Comprehensive experiments, both offline and online, affirm that LAST outperforms state-of-the-art re-ranking models.

Do Not Wait: Learning Re-Ranking Model Without User Feedback At Serving Time in E-Commerce

TL;DR

Abstract

Paper Structure (13 sections, 7 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 13 sections, 7 equations, 3 figures, 3 tables, 1 algorithm.

Introduction
Preliminary
LAST: The Proposed Method
Experiments
Offline Experiments
Performance Analysis
Hyper-parameters Analysis
Online Experiments
Conclusion
Auxiliary Material
Offline Experiment Setup
Offline Experiment 1
Online Experiment Setup

Figures (3)

Figure 1: Classic online learning methods and our new proposal, LAST. $\boldsymbol{\theta}$ means the parameter of a model. The classic methods rely on authentic user feedback, providing enduring non-request-specific updates. LAST provides transient request-specific updates with the help of a surrogate evaluation model. The two can work synergistically to create a more adaptive and responsive online serving system.
Figure 2: Online serving processes of re-ranking models. (a) The traditional re-ranking models. The model generates a recommendation list based on its fixed policy and presents the list directly to the user. (b) The cascade version of LAST. The actor interacts with the evaluator iteratively to improve its list-generating policy for higher evaluations. The list generated in the last iteration is presented to the user. (c) The parallel version of LAST. A separate gradient exploration module suggests potential model modifications. The actor tries out the suggestions, and the evaluator estimates their quality. The list with the highest evaluation is presented to the user.
Figure 3: The impact of hyper-parameters on LAST.

Do Not Wait: Learning Re-Ranking Model Without User Feedback At Serving Time in E-Commerce

TL;DR

Abstract

Do Not Wait: Learning Re-Ranking Model Without User Feedback At Serving Time in E-Commerce

Authors

TL;DR

Abstract

Table of Contents

Figures (3)