Table of Contents
Fetching ...

STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking Models

Yi Xu, Chaofan Fan, Jinxin Hu, Yu Zhang, Zeng Xiaoyi, Jing Zhang

TL;DR

STORE tackles scalability challenges in ranking models by decoupling feature heterogeneity from interaction complexity. It introduces Semantic Tokenization to compress high-cardinality features into stable semantic IDs, Orthogonal Rotation to diversify low-cardinality interactions, and Efficient Attention (MOBA-based routing) to cut attention cost while preserving accuracy, lowering the $O(H^2)$ burden in large token sets. The approach yields consistent offline gains (AUC, GAUC) and online gains (CTR) along with a substantial boost in training throughput, demonstrated on public and industrial datasets and validated in online A/B tests. This framework enables more predictable scaling in large-scale recommender systems, offering a practical path to combining high feature diversity with efficient, accurate ranking.

Abstract

Ranking models have become an important part of modern personalized recommendation systems. However, significant challenges persist in handling high-cardinality, heterogeneous, and sparse feature spaces, particularly regarding model scalability and efficiency. We identify two key bottlenecks: (i) Representation Bottleneck: Driven by the high cardinality and dynamic nature of features, model capacity is forced into sparse-activated embedding layers, leading to low-rank representations. This, in turn, triggers phenomena like "One-Epoch" and "Interaction-Collapse," ultimately hindering model scalability.(ii) Computational Bottleneck: Integrating all heterogeneous features into a unified model triggers an explosion in the number of feature tokens, rendering traditional attention mechanisms computationally demanding and susceptible to attention dispersion. To dismantle these barriers, we introduce STORE, a unified and scalable token-based ranking framework built upon three core innovations: (1) Semantic Tokenization fundamentally tackles feature heterogeneity and sparsity by decomposing high-cardinality sparse features into a compact set of stable semantic tokens; and (2) Orthogonal Rotation Transformation is employed to rotate the subspace spanned by low-cardinality static features, which facilitates more efficient and effective feature interactions; and (3) Efficient attention that filters low-contributing tokens to improve computional efficiency while preserving model accuracy. Across extensive offline experiments and online A/B tests, our framework consistently improves prediction accuracy(online CTR by 2.71%, AUC by 1.195%) and training effeciency (1.84 throughput).

STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking Models

TL;DR

STORE tackles scalability challenges in ranking models by decoupling feature heterogeneity from interaction complexity. It introduces Semantic Tokenization to compress high-cardinality features into stable semantic IDs, Orthogonal Rotation to diversify low-cardinality interactions, and Efficient Attention (MOBA-based routing) to cut attention cost while preserving accuracy, lowering the burden in large token sets. The approach yields consistent offline gains (AUC, GAUC) and online gains (CTR) along with a substantial boost in training throughput, demonstrated on public and industrial datasets and validated in online A/B tests. This framework enables more predictable scaling in large-scale recommender systems, offering a practical path to combining high feature diversity with efficient, accurate ranking.

Abstract

Ranking models have become an important part of modern personalized recommendation systems. However, significant challenges persist in handling high-cardinality, heterogeneous, and sparse feature spaces, particularly regarding model scalability and efficiency. We identify two key bottlenecks: (i) Representation Bottleneck: Driven by the high cardinality and dynamic nature of features, model capacity is forced into sparse-activated embedding layers, leading to low-rank representations. This, in turn, triggers phenomena like "One-Epoch" and "Interaction-Collapse," ultimately hindering model scalability.(ii) Computational Bottleneck: Integrating all heterogeneous features into a unified model triggers an explosion in the number of feature tokens, rendering traditional attention mechanisms computationally demanding and susceptible to attention dispersion. To dismantle these barriers, we introduce STORE, a unified and scalable token-based ranking framework built upon three core innovations: (1) Semantic Tokenization fundamentally tackles feature heterogeneity and sparsity by decomposing high-cardinality sparse features into a compact set of stable semantic tokens; and (2) Orthogonal Rotation Transformation is employed to rotate the subspace spanned by low-cardinality static features, which facilitates more efficient and effective feature interactions; and (3) Efficient attention that filters low-contributing tokens to improve computional efficiency while preserving model accuracy. Across extensive offline experiments and online A/B tests, our framework consistently improves prediction accuracy(online CTR by 2.71%, AUC by 1.195%) and training effeciency (1.84 throughput).

Paper Structure

This paper contains 16 sections, 9 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of the proposed STORE.
  • Figure 2: Scaling Laws Study of (a) Epoch Number (b) SID Number (c) Layer Number (d) Sparsity.