CAMEL: Confidence-Gated Reflection for Reward Modeling

Zirui Zhu; Hailun Xu; Yang Luo; Yong Liu; Kanchan Sarkar; Kun Xu; Yang You

CAMEL: Confidence-Gated Reflection for Reward Modeling

Zirui Zhu, Hailun Xu, Yang Luo, Yong Liu, Kanchan Sarkar, Kun Xu, Yang You

TL;DR

CAMEL, a confidence-gated reflection framework that performs a lightweight single-token preference decision first and selectively invokes reflection only for low-confidence instances, is proposed, establishing a strictly better accuracy-efficiency Pareto frontier.

Abstract

Reward models play a fundamental role in aligning large language models with human preferences. Existing methods predominantly follow two paradigms: scalar discriminative preference models, which are efficient but lack interpretability, and generative judging models, which offer richer reasoning at the cost of higher computational overhead. We observe that the log-probability margin between verdict tokens strongly correlates with prediction correctness, providing a reliable proxy for instance difficulty without additional inference cost. Building on this insight, we propose CAMEL, a confidence-gated reflection framework that performs a lightweight single-token preference decision first and selectively invokes reflection only for low-confidence instances. To induce effective self-correction, we train the model via reinforcement learning with counterfactual prefix augmentation, which exposes the model to diverse initial verdicts and encourages genuine revision. Empirically, CAMEL achieves state-of-the-art performance on three widely used reward-model benchmarks with 82.9% average accuracy, surpassing the best prior model by 3.2% and outperforming 70B-parameter models using only 14B parameters, while establishing a strictly better accuracy-efficiency Pareto frontier.

CAMEL: Confidence-Gated Reflection for Reward Modeling

TL;DR

Abstract

Paper Structure (33 sections, 8 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 33 sections, 8 equations, 8 figures, 2 tables, 2 algorithms.

Introduction
Preliminaries
Problem Formulation.
Scalar Reward Models.
Generative Reward Models.
Method
Confidence Score and Accuracy
CAMEL Judging Prompt and Gating Rule
Reinforcement Learning for Reflection
Training and Inference Pipeline
Experiments
Experimental Setup
Data and Models.
Evaluation Benchmarks.
Main Results
...and 18 more sections

Figures (8)

Figure 1: Given a query $q$ and two candidate responses $(r_a, r_b)$, a scalar reward model assigns scores to the responses and induces a pairwise preference, whereas a generative reward model produces a textual judgment when outputting the preferred response.
Figure 2: Confidence score distribution and its relationship with prediction accuracy on Skywork-Reward-Preference-80K using Qwen3-14B. (a) Distribution of confidence scores for correct and incorrect predictions. Correct predictions exhibit a heavier tail toward higher confidence scores, while incorrect predictions are concentrated in the low-confidence region. (b) Accuracy as a function of confidence score. Each point represents the accuracy within a binned confidence interval, with color intensity indicating sample count. It is clear that predictions with higher confidence scores are substantially more likely to be correct.
Figure 3: CAMEL Preference Judgment Prompt.
Figure 4: Accuracy vs. Average Output Tokens Trade-off on RewardBench/RM-Bench. The Pareto curve illustrates the performance-efficiency trade-off of CAMEL under varying confidence thresholds $\tau$. CAMEL-Fast uses only token-level confidence without reflection, while CAMEL-Reflection applies full reflection to all samples. By adaptively selecting when to reflect based on confidence, CAMEL achieves superior accuracy with significantly fewer tokens compared to RM-R1 baselines. The color gradient indicates the confidence threshold $\tau$ from low to high.
Figure 5: Confidence score distribution before and after CAMEL training.
...and 3 more figures

CAMEL: Confidence-Gated Reflection for Reward Modeling

TL;DR

Abstract

CAMEL: Confidence-Gated Reflection for Reward Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (8)