Table of Contents
Fetching ...

From Features to Transformers: Redefining Ranking for Scalable Impact

Fedor Borisyuk, Lars Hertel, Ganesh Parameswaran, Gaurav Srivastava, Sudarshan Srinivasa Ramanujam, Borja Ocejo, Peng Du, Andrei Akterskii, Neil Daftary, Shao Tang, Daqi Sun, Qiang Charles Xiao, Deepesh Nathani, Mohit Kothari, Yun Dai, Guoyao Li, Aman Gupta

TL;DR

LiGR tackles scalable ranking by replacing heavy handcrafted feature engineering with a transformer-based generative ranking model that processes user history and candidate items via learned normalization and setwise attention. It demonstrates minimal feature dependence (as few as $7$ features) while validating scaling laws across model size, data, and context length, achieving meaningful gains in ranking and retrieval in production. The approach includes a session-level setwise ranking layer, semantic IDs via Residual-Quantized VAE to shrink model size to $1.3$B parameters, and efficient single-pass inference for deployment at scale. Online experiments show DAU engagement and time-spent gains, highlighting practical impact for large-scale recommender and retrieval systems.

Abstract

We present LiGR, a large-scale ranking framework developed at LinkedIn that brings state-of-the-art transformer-based modeling architectures into production. We introduce a modified transformer architecture that incorporates learned normalization and simultaneous set-wise attention to user history and ranked items. This architecture enables several breakthrough achievements, including: (1) the deprecation of most manually designed feature engineering, outperforming the prior state-of-the-art system using only few features (compared to hundreds in the baseline), (2) validation of the scaling law for ranking systems, showing improved performance with larger models, more training data, and longer context sequences, and (3) simultaneous joint scoring of items in a set-wise manner, leading to automated improvements in diversity. To enable efficient serving of large ranking models, we describe techniques to scale inference effectively using single-pass processing of user history and set-wise attention. We also summarize key insights from various ablation studies and A/B tests, highlighting the most impactful technical approaches.

From Features to Transformers: Redefining Ranking for Scalable Impact

TL;DR

LiGR tackles scalable ranking by replacing heavy handcrafted feature engineering with a transformer-based generative ranking model that processes user history and candidate items via learned normalization and setwise attention. It demonstrates minimal feature dependence (as few as features) while validating scaling laws across model size, data, and context length, achieving meaningful gains in ranking and retrieval in production. The approach includes a session-level setwise ranking layer, semantic IDs via Residual-Quantized VAE to shrink model size to B parameters, and efficient single-pass inference for deployment at scale. Online experiments show DAU engagement and time-spent gains, highlighting practical impact for large-scale recommender and retrieval systems.

Abstract

We present LiGR, a large-scale ranking framework developed at LinkedIn that brings state-of-the-art transformer-based modeling architectures into production. We introduce a modified transformer architecture that incorporates learned normalization and simultaneous set-wise attention to user history and ranked items. This architecture enables several breakthrough achievements, including: (1) the deprecation of most manually designed feature engineering, outperforming the prior state-of-the-art system using only few features (compared to hundreds in the baseline), (2) validation of the scaling law for ranking systems, showing improved performance with larger models, more training data, and longer context sequences, and (3) simultaneous joint scoring of items in a set-wise manner, leading to automated improvements in diversity. To enable efficient serving of large ranking models, we describe techniques to scale inference effectively using single-pass processing of user history and set-wise attention. We also summarize key insights from various ablation studies and A/B tests, highlighting the most impactful technical approaches.

Paper Structure

This paper contains 22 sections, 12 figures, 7 tables.

Figures (12)

  • Figure 1: LiGR architecture combining historical attention for feature aggregation and in-session attention.
  • Figure 2: LiGR Transformer architecture using a gating skip-connection.
  • Figure 3: LiGR member tower based on LiGR. Member profile features and member activity history are inputs to a transformer model, green blocks are used when LiGR is enabled.
  • Figure 4: LiGR system architecture.
  • Figure 5: Scaling of normalized evaluation entropy as a function of training FLOPS for LiGR and HSTU.
  • ...and 7 more figures