Scale-Invariant Learning-to-Rank

Alessio Petrozziello; Christian Sommeregger; Ye-Sheen Lim

Scale-Invariant Learning-to-Rank

Alessio Petrozziello, Christian Sommeregger, Ye-Sheen Lim

TL;DR

The paper tackles the problem of ranking robustness under production-time feature scale changes in learning-to-rank systems. It introduces a scale-invariant architecture that blends a deep path with a wide, interaction-driven path, ensuring score differences are invariant to positive scaling of certain features: $f_n(\tilde{\boldsymbol{x}}) - f_n(\tilde{\boldsymbol{x}}') = f_n(\boldsymbol{x}) - f_n(\boldsymbol{x}')$. Through experiments on ExpediaHotels, RecTour, and MSLR with prediction-time perturbations, the approach maintains or improves NDCG under scaling, while standard LTR baselines degrade. The results demonstrate practical robustness for real-world deployments, albeit with limitations such as reliance on positive features and known scaling-sensitive features, guiding future work toward broader perturbations and automatic detection.

Abstract

At Expedia, learning-to-rank (LTR) models plays a key role on our website in sorting and presenting information more relevant to users, such as search filters, property rooms, amenities, and images. A major challenge in deploying these models is ensuring consistent feature scaling between training and production data, as discrepancies can lead to unreliable rankings when deployed. Normalization techniques like feature standardization and batch normalization could address these issues but are impractical in production due to latency impacts and the difficulty of distributed real-time inference. To address consistent feature scaling issue, we introduce a scale-invariant LTR framework which combines a deep and a wide neural network to mathematically guarantee scale-invariance in the model at both training and prediction time. We evaluate our framework in simulated real-world scenarios with injected feature scale issues by perturbing the test set at prediction time, and show that even with inconsistent train-test scaling, using framework achieves better performance than without.

Scale-Invariant Learning-to-Rank

TL;DR

. Through experiments on ExpediaHotels, RecTour, and MSLR with prediction-time perturbations, the approach maintains or improves NDCG under scaling, while standard LTR baselines degrade. The results demonstrate practical robustness for real-world deployments, albeit with limitations such as reliance on positive features and known scaling-sensitive features, guiding future work toward broader perturbations and automatic detection.

Abstract

Paper Structure (5 sections, 1 theorem, 8 equations, 4 tables)

This paper contains 5 sections, 1 theorem, 8 equations, 4 tables.

Background
Scale-Invariant Learning-to-Rank
Evaluation
Conclusion
Scale-Invariance Proof

Key Result

theorem 1

The score difference between $\mathbf{x}_{ij}$ and $\mathbf{x}_{ik}$ does not change before and after scaling, i.e. $f_n(\tilde{\mathbf{x}}_{ij}) - f_n(\tilde{\mathbf{x}}_{ik})=f_n(\mathbf{x}_{ij}) - f_n(\mathbf{x}_{ik})$.

Theorems & Definitions (2)

theorem 1
proof

Scale-Invariant Learning-to-Rank

TL;DR

Abstract

Scale-Invariant Learning-to-Rank

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (2)