Table of Contents
Fetching ...

GoalRank: Group-Relative Optimization for a Large Ranking Model

Kaike Zhang, Xiaobei Wang, Shuchang Liu, Hailan Yang, Xiang Li, Lantao Hu, Han Li, Qi Cao, Fei Sun, Kun Gai

TL;DR

This work reframes ranking in recommender systems from a two-stage Generator–Evaluator paradigm to a generator-only one-stage approach. It proves that a sufficiently large generator can strictly reduce the approximation error to the optimal ranking policy $\pi^*$ and exhibit scaling laws as size grows, surpassing multi-generator ensembles. By deriving a group-relative optimization principle and leveraging a reward model trained on real user feedback, the authors present GoalRank, a generator-only ranker trained to align with a practical reference policy within constructed groups. Offline benchmarks across public and industrial datasets, plus large-scale online A/B tests, demonstrate that GoalRank consistently outperforms state-of-the-art baselines and benefits significantly from scaling, with production deployment highlighting substantial gains in engagement and efficiency.

Abstract

Mainstream ranking approaches typically follow a Generator-Evaluator two-stage paradigm, where a generator produces candidate lists and an evaluator selects the best one. Recent work has attempted to enhance performance by expanding the number of candidate lists, for example, through multi-generator settings. However, ranking involves selecting a recommendation list from a combinatorially large space. Simply enlarging the candidate set remains ineffective, and performance gains quickly saturate. At the same time, recent advances in large recommendation models have shown that end-to-end one-stage models can achieve promising performance with the expectation of scaling laws. Motivated by this, we revisit ranking from a generator-only one-stage perspective. We theoretically prove that, for any (finite Multi-)Generator-Evaluator model, there always exists a generator-only model that achieves strictly smaller approximation error to the optimal ranking policy, while also enjoying scaling laws as its size increases. Building on this result, we derive an evidence upper bound of the one-stage optimization objective, from which we find that one can leverage a reward model trained on real user feedback to construct a reference policy in a group-relative manner. This reference policy serves as a practical surrogate of the optimal policy, enabling effective training of a large generator-only ranker. Based on these insights, we propose GoalRank, a generator-only ranking framework. Extensive offline experiments on public benchmarks and large-scale online A/B tests demonstrate that GoalRank consistently outperforms state-of-the-art methods.

GoalRank: Group-Relative Optimization for a Large Ranking Model

TL;DR

This work reframes ranking in recommender systems from a two-stage Generator–Evaluator paradigm to a generator-only one-stage approach. It proves that a sufficiently large generator can strictly reduce the approximation error to the optimal ranking policy and exhibit scaling laws as size grows, surpassing multi-generator ensembles. By deriving a group-relative optimization principle and leveraging a reward model trained on real user feedback, the authors present GoalRank, a generator-only ranker trained to align with a practical reference policy within constructed groups. Offline benchmarks across public and industrial datasets, plus large-scale online A/B tests, demonstrate that GoalRank consistently outperforms state-of-the-art baselines and benefits significantly from scaling, with production deployment highlighting substantial gains in engagement and efficiency.

Abstract

Mainstream ranking approaches typically follow a Generator-Evaluator two-stage paradigm, where a generator produces candidate lists and an evaluator selects the best one. Recent work has attempted to enhance performance by expanding the number of candidate lists, for example, through multi-generator settings. However, ranking involves selecting a recommendation list from a combinatorially large space. Simply enlarging the candidate set remains ineffective, and performance gains quickly saturate. At the same time, recent advances in large recommendation models have shown that end-to-end one-stage models can achieve promising performance with the expectation of scaling laws. Motivated by this, we revisit ranking from a generator-only one-stage perspective. We theoretically prove that, for any (finite Multi-)Generator-Evaluator model, there always exists a generator-only model that achieves strictly smaller approximation error to the optimal ranking policy, while also enjoying scaling laws as its size increases. Building on this result, we derive an evidence upper bound of the one-stage optimization objective, from which we find that one can leverage a reward model trained on real user feedback to construct a reference policy in a group-relative manner. This reference policy serves as a practical surrogate of the optimal policy, enabling effective training of a large generator-only ranker. Based on these insights, we propose GoalRank, a generator-only ranking framework. Extensive offline experiments on public benchmarks and large-scale online A/B tests demonstrate that GoalRank consistently outperforms state-of-the-art methods.

Paper Structure

This paper contains 29 sections, 8 theorems, 65 equations, 4 figures, 5 tables.

Key Result

Theorem 1

Given $\alpha,\beta>0$ and any $k\in\mathbb{N}_{>0}$. For the $k$-mixture policy space $\mathcal{C}_k^m(\alpha,\beta)$ in Definition def:k_mixture, there exists a class of larger generators with associated policy space such that

Figures (4)

  • Figure 1: Illustration of different ranking paradigms: (a) Generator-only; (b) Generator–Evaluator; (c) Multi-Generator–Evaluator; and (d) Performance trend with increasing number of generators.
  • Figure 2: Training pipeline of group-relative optimization for a large ranker, GoalRank.
  • Figure 3: Scaling performance of GoalRank and baselines on the Industry-0.1B dataset across model sizes from 1M to 0.1B parameters.
  • Figure 4: Online workflows.

Theorems & Definitions (21)

  • Definition 1: $(\alpha,\beta)$-bounded generator class
  • Definition 2: $k$-mixture $(\alpha,\beta)$-bounded policy space
  • Definition 3: Approximation distance (KL error)
  • Theorem 1
  • Lemma 1
  • proof
  • Theorem 2
  • proof
  • Lemma 2
  • Remark 1: Finite domains
  • ...and 11 more