RuCL: Stratified Rubric-Based Curriculum Learning for Multimodal Large Language Model Reasoning

Yukun Chen; Jiaming Li; Longze Chen; Ze Gong; Jingpeng Li; Zhen Qin; Hengyu Chang; Ancheng Xu; Zhihao Yang; Hamid Alinejad-Rokny; Qiang Qu; Bo Zheng; Min Yang

RuCL: Stratified Rubric-Based Curriculum Learning for Multimodal Large Language Model Reasoning

Yukun Chen, Jiaming Li, Longze Chen, Ze Gong, Jingpeng Li, Zhen Qin, Hengyu Chang, Ancheng Xu, Zhihao Yang, Hamid Alinejad-Rokny, Qiang Qu, Bo Zheng, Min Yang

TL;DR

Stratified Rubric-based Curriculum Learning (RuCL) is proposed, a novel framework that reformulates curriculum learning by shifting the focus from data selection to reward design and dynamically adjusting rubric weights during training.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a prevailing paradigm for enhancing reasoning in Multimodal Large Language Models (MLLMs). However, relying solely on outcome supervision risks reward hacking, where models learn spurious reasoning patterns to satisfy final answer checks. While recent rubric-based approaches offer fine-grained supervision signals, they suffer from high computational costs of instance-level generation and inefficient training dynamics caused by treating all rubrics as equally learnable. In this paper, we propose Stratified Rubric-based Curriculum Learning (RuCL), a novel framework that reformulates curriculum learning by shifting the focus from data selection to reward design. RuCL generates generalized rubrics for broad applicability and stratifies them based on the model's competence. By dynamically adjusting rubric weights during training, RuCL guides the model from mastering foundational perception to tackling advanced logical reasoning. Extensive experiments on various visual reasoning benchmarks show that RuCL yields a remarkable +7.83% average improvement over the Qwen2.5-VL-7B model, achieving a state-of-the-art accuracy of 60.06%.

RuCL: Stratified Rubric-Based Curriculum Learning for Multimodal Large Language Model Reasoning

TL;DR

Abstract

Paper Structure (33 sections, 17 equations, 3 figures, 7 tables)

This paper contains 33 sections, 17 equations, 3 figures, 7 tables.

Introduction
Related Work
Stratified Rubric-based Curriculum Learning (RuCL)
Problem Formulation
Phase I: Generalized Rubric Construction and Stratification
Phase II: Dynamic Curriculum Learning
(1) Stabilization Phase:
(2) Curriculum Ramp-up:
(3) Advanced Consolidation:
Experiments
Experiment Setup
Main Results
Mathematical Reasoning Performance.
Ablation Study
Conclusion
...and 18 more sections

Figures (3)

Figure 1: Comparison of reward paradigms. We move beyond (A) outcome-only signals and (B) unstructured dense feedback. (C) Our RuCL framework organizes rubrics into a stratified curriculum, aligning reward complexity with the model's progressive learning stages.
Figure 2: Overview of Stratified Rubric-based Curriculum Learning (RuCL). The framework proceeds in two stages: (Top) Generalized Rubric Construction and Stratification, where evaluation rubrics are generated and categorized into Foundational ($\mathcal{R}_{\text{easy}}$) and Advanced ($\mathcal{R}_{\text{hard}}$) tiers based on empirical difficulty. (Bottom) Dynamic Curriculum Learning, where the rubric-based reward is synthesized via a dynamic weighting mechanism controlled by a scheduler. By adjusting the weight $\lambda$ based on real-time performance, RuCL progressively shifts the optimization focus from mastering basic skills to tackling complex reasoning.
Figure 3: Left: Training dynamics of Foundational (blue) and Advanced (red) rubric rewards. Middle: Ablation study results on rubric aggregation and scheduling strategies. Right: Sensitivity analysis of the reward balancing hyperparameter.

RuCL: Stratified Rubric-Based Curriculum Learning for Multimodal Large Language Model Reasoning

TL;DR

Abstract

RuCL: Stratified Rubric-Based Curriculum Learning for Multimodal Large Language Model Reasoning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)