Table of Contents
Fetching ...

OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment

Tianci Liu, Ran Xu, Tony Yu, Ilgee Hong, Carl Yang, Tuo Zhao, Haoyu Wang

TL;DR

OpenRubrics introduces a scalable framework for generating rubrics as rewards, combining Contrastive Rubric Generation with preference-label consistency to produce dual hard rules and high-level principles. The two-stage Rubric-RM pipeline first generates rubrics and then uses them to predict pairwise preferences, yielding superior reward signals. Empirical results across eight benchmarks show Rubric-RM outperforms strong baselines and improves policy quality in instruction-following and biomedical domains, with efficient decoding and rubric reuse. The work demonstrates that principled rubric signals can bridge costly human evaluation and automated reward modeling, enabling scalable LLM alignment. The HealthBench findings further illustrate domain transfer potential and the value of domain-tuned rubric signals for trustworthy AI.

Abstract

Reward modeling lies at the core of reinforcement learning from human feedback (RLHF), yet most existing reward models rely on scalar or pairwise judgments that fail to capture the multifaceted nature of human preferences. Recent studies have explored rubrics-as-rewards (RaR) that uses structured natural language criteria that capture multiple dimensions of response quality. However, producing rubrics that are both reliable and scalable remains a key challenge. In this work, we introduce OpenRubrics, a diverse, large-scale collection of (prompt, rubric) pairs for training rubric-generation and rubric-based reward models. To elicit discriminative and comprehensive evaluation signals, we introduce Contrastive Rubric Generation (CRG), which derives both hard rules (explicit constraints) and principles (implicit qualities) by contrasting preferred and rejected responses. We further improve reliability by enforcing preference-label consistency via rejection sampling to remove noisy rubrics. Across multiple reward-modeling benchmarks, our rubric-based reward model, Rubric-RM, surpasses strong size-matched baselines by 6.8%. These gains transfer to policy models on instruction-following and biomedical benchmarks. Our results show that rubrics provide scalable alignment signals that narrow the gap between costly human evaluation and automated reward modeling, enabling a new principle-driven paradigm for LLM alignment.

OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment

TL;DR

OpenRubrics introduces a scalable framework for generating rubrics as rewards, combining Contrastive Rubric Generation with preference-label consistency to produce dual hard rules and high-level principles. The two-stage Rubric-RM pipeline first generates rubrics and then uses them to predict pairwise preferences, yielding superior reward signals. Empirical results across eight benchmarks show Rubric-RM outperforms strong baselines and improves policy quality in instruction-following and biomedical domains, with efficient decoding and rubric reuse. The work demonstrates that principled rubric signals can bridge costly human evaluation and automated reward modeling, enabling scalable LLM alignment. The HealthBench findings further illustrate domain transfer potential and the value of domain-tuned rubric signals for trustworthy AI.

Abstract

Reward modeling lies at the core of reinforcement learning from human feedback (RLHF), yet most existing reward models rely on scalar or pairwise judgments that fail to capture the multifaceted nature of human preferences. Recent studies have explored rubrics-as-rewards (RaR) that uses structured natural language criteria that capture multiple dimensions of response quality. However, producing rubrics that are both reliable and scalable remains a key challenge. In this work, we introduce OpenRubrics, a diverse, large-scale collection of (prompt, rubric) pairs for training rubric-generation and rubric-based reward models. To elicit discriminative and comprehensive evaluation signals, we introduce Contrastive Rubric Generation (CRG), which derives both hard rules (explicit constraints) and principles (implicit qualities) by contrasting preferred and rejected responses. We further improve reliability by enforcing preference-label consistency via rejection sampling to remove noisy rubrics. Across multiple reward-modeling benchmarks, our rubric-based reward model, Rubric-RM, surpasses strong size-matched baselines by 6.8%. These gains transfer to policy models on instruction-following and biomedical benchmarks. Our results show that rubrics provide scalable alignment signals that narrow the gap between costly human evaluation and automated reward modeling, enabling a new principle-driven paradigm for LLM alignment.

Paper Structure

This paper contains 48 sections, 7 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Overall Framework for Synthetic Rubric Generation in OpenRubrics.
  • Figure 2: Statistics Overview of OpenRubrics.
  • Figure 3: The T-SNE plot for the embeddings of prompts.
  • Figure 4: Comparison of trained policy models on IFBench. Results of baselines except RLCF are from bhaskar2025language. We evaluate RLCF with its official checkpoint.
  • Figure 5: Comparison of different judges, reward models, and trained policy models on HealthBench.