Table of Contents
Fetching ...

UniScale: Synergistic Entire Space Data and Model Scaling for Search Ranking

Liren Yu, Caiyuan Li, Feiyi Dong, Tao Zhang, Zhixuan Zhang, Dan Ou, Haihong Tang, Bo Zheng

Abstract

Recent advances in Large Language Models (LLMs) have inspired a surge of scaling law research in industrial search, advertising, and recommendation systems. However, existing approaches focus mainly on architectural improvements, overlooking the critical synergy between data and architecture design. We observe that scaling model parameters alone exhibits diminishing returns, i.e., the marginal gain in performance steadily declines as model size increases, and that the performance degradation caused by complex heterogeneous data distributions is often irrecoverable through model design alone. In this paper, we propose UniScale to address these limitations, a novel co-design framework that jointly optimizes data and architecture to unlock the full potential of model scaling, which includes two core parts: (1) ES$^3$ (Entire-Space Sample System), a high-quality data scaling system that expands the training signal beyond conventional sampling strategies from both intra-domain request contexts with global supervised signal constructed by hierarchical label attribution and cross-domain samples aligning with the essence of user decision under similar content exposure environment in search domain; and (2) HHSFT (Heterogeneous Hierarchical Sample Fusion Transformer), a novel architecture designed to effectively model the complex heterogeneous distribution of scaled data and to harness the entire space user behavior data with Heterogeneous Hierarchical Feature Interaction and Entire Space User Interest Fusion, thereby surpassing the performance ceiling of structure-only model tuning. Extensive experiments demonstrate that UniScale achieves significant improvements through the synergistic co-design of data and architecture and exhibits scaling trends. Online A/B tests on a real-world e-commerce search platform further show gains of 1.70% in purchase and 2.04% in Gross Merchandise Volume (GMV).

UniScale: Synergistic Entire Space Data and Model Scaling for Search Ranking

Abstract

Recent advances in Large Language Models (LLMs) have inspired a surge of scaling law research in industrial search, advertising, and recommendation systems. However, existing approaches focus mainly on architectural improvements, overlooking the critical synergy between data and architecture design. We observe that scaling model parameters alone exhibits diminishing returns, i.e., the marginal gain in performance steadily declines as model size increases, and that the performance degradation caused by complex heterogeneous data distributions is often irrecoverable through model design alone. In this paper, we propose UniScale to address these limitations, a novel co-design framework that jointly optimizes data and architecture to unlock the full potential of model scaling, which includes two core parts: (1) ES (Entire-Space Sample System), a high-quality data scaling system that expands the training signal beyond conventional sampling strategies from both intra-domain request contexts with global supervised signal constructed by hierarchical label attribution and cross-domain samples aligning with the essence of user decision under similar content exposure environment in search domain; and (2) HHSFT (Heterogeneous Hierarchical Sample Fusion Transformer), a novel architecture designed to effectively model the complex heterogeneous distribution of scaled data and to harness the entire space user behavior data with Heterogeneous Hierarchical Feature Interaction and Entire Space User Interest Fusion, thereby surpassing the performance ceiling of structure-only model tuning. Extensive experiments demonstrate that UniScale achieves significant improvements through the synergistic co-design of data and architecture and exhibits scaling trends. Online A/B tests on a real-world e-commerce search platform further show gains of 1.70% in purchase and 2.04% in Gross Merchandise Volume (GMV).
Paper Structure (27 sections, 3 equations, 4 figures, 4 tables)

This paper contains 27 sections, 3 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overall Architecture of Proposed UniScale, which combines Entire Space Sample System (ES³) for high quality data scaling and Heterogeneous Hierarchical Sample Fusion Transformer (HHSFT) for fully incorporating user interests expressed by entire space user behavior.
  • Figure 2: Overview of the Entire-Space Sample System (ES3). ES3 constructs a unified, bias-mitigated training set by synergistically integrating samples across the entire user behavior space. The framework comprises two core modules: (i) Intra-domain Sample and Label Expansion; (ii) Cross-domain Sample Searchification.
  • Figure 3: Overview of Heterogeneous Hierarchical Sample Fusion Transformer (HHSFT) architecture. The HHSFT is designed to exploit model and data scaling in ranking systems through two primary stages. First, the Heterogeneous Hierarchical Feature Interaction (HHFI) stage employs heterogeneous feature attention layers with token-specific projection matrices and FFNs to preserve domain-unique distributions, followed by a global feature attention layer to capture high-order cross-domain interactions. Second, the Entire Space User Interest Fusion (ESUIF) stage comprises: Domain-Routed Expert Fusion (DREF) utilizes sample routing constraints to disentangle shared knowledge from domain-specific patterns; and Domain-Aware Personalized Gated Attention (DAPGA) adaptively modulates cross-domain information transfer via personalized gating.
  • Figure 4: AUC gain vs Dense Parameters Scale Ratio