Table of Contents
Fetching ...

CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling

Dengcan Liu, Fengkai Yang, Xiaohan Wang, Shurui Yan, Jiajun Chai, Jiahao Li, Yikun Ban, Zhendong Mao, Wei Lin, Guojun Yin

TL;DR

CDRRM (Contrast-Driven Rubric Reward Model), a framework built on a novel Contrast-then-Synthesis paradigm for high-quality rubric generation and guided preference judgment that achieves state-of-the-art performance across diverse domains and effectively mitigates aforementioned evaluation biases.

Abstract

Reward modeling is essential for aligning Large Language Models(LLMs) with human preferences, yet conventional reward models suffer from poor interpretability and heavy reliance on costly expert annotations. While recent rubric-based approaches enhance evaluation transparency, they lack systematic quality control, yielding noisy and redundant criteria, failing to mitigate persistent biases (e.g., verbosity, position) in LLM evaluators, and creating a scalability-reliability trade-off. To address these limitations, we propose CDRRM (Contrast-Driven Rubric Reward Model), a framework built on a novel Contrast-then-Synthesis paradigm for high-quality rubric generation and guided preference judgment. CDRRM first conducts multi-dimensional contrastive profiling on preference pairs to identify causal discriminative factors, then synthesizes these insights into compact, context-aware rubrics to guide preference judg- ments. Extensive experiments on three authoritative benchmarks (RewardBench, RMBench, RMB) demonstrate that CDRRM achieves state-of-the-art performance across diverse domains and effectively mitigates aforementioned evaluation biases. Notably, our approach delivers exceptional data efficiency: training the rubric generator on only 3k high-quality samples empowers a frozen pre-trained judge model to outperform fully fine-tuned baselines. This work offers a scalable, interpretable, and data-efficient path for reward modeling.

CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling

TL;DR

CDRRM (Contrast-Driven Rubric Reward Model), a framework built on a novel Contrast-then-Synthesis paradigm for high-quality rubric generation and guided preference judgment that achieves state-of-the-art performance across diverse domains and effectively mitigates aforementioned evaluation biases.

Abstract

Reward modeling is essential for aligning Large Language Models(LLMs) with human preferences, yet conventional reward models suffer from poor interpretability and heavy reliance on costly expert annotations. While recent rubric-based approaches enhance evaluation transparency, they lack systematic quality control, yielding noisy and redundant criteria, failing to mitigate persistent biases (e.g., verbosity, position) in LLM evaluators, and creating a scalability-reliability trade-off. To address these limitations, we propose CDRRM (Contrast-Driven Rubric Reward Model), a framework built on a novel Contrast-then-Synthesis paradigm for high-quality rubric generation and guided preference judgment. CDRRM first conducts multi-dimensional contrastive profiling on preference pairs to identify causal discriminative factors, then synthesizes these insights into compact, context-aware rubrics to guide preference judg- ments. Extensive experiments on three authoritative benchmarks (RewardBench, RMBench, RMB) demonstrate that CDRRM achieves state-of-the-art performance across diverse domains and effectively mitigates aforementioned evaluation biases. Notably, our approach delivers exceptional data efficiency: training the rubric generator on only 3k high-quality samples empowers a frozen pre-trained judge model to outperform fully fine-tuned baselines. This work offers a scalable, interpretable, and data-efficient path for reward modeling.
Paper Structure (23 sections, 14 equations, 3 figures, 9 tables)

This paper contains 23 sections, 14 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: An illustrative example of rubric generation for a Greatest Common Divisor (GCD) task, contrasting rubrics from direct prompting (right, redundant and potentially misleading) with those from our Contrast-then-Synthesis paradigm (left, concise and effective). The bottom-left panel shows statistics on the number of generated rubrics with respect to single preferences.
  • Figure 2: The CDRRM framework. (Top) The Contrast-then-Synthesis paradigm synthesizes evidence-based rubrics via contrastive analysis of preference pairs. (Bottom) These rubrics, paired with synthesized rubric-grounded justifications, supervise the training of a Rubric Generator (to automate context-aware criterion synthesis) and a Judge Model (to generate rubric-aligned justifications for precise preference predictions).
  • Figure 3: Impact of training data size on model performance. Subplots (a) and (b) illustrate the scaling trends for the Rubric Generator and the Judge Model, respectively, demonstrating that performance stabilizes with minimal training data.