Table of Contents
Fetching ...

Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Reasoning

Boren Hu, Xiao Liu, Boci Peng, Xinping Zhao, Xiaoran Shang, Yun Zhu, Lijun Wu

TL;DR

This work introduces a novel Bidirectional Curriculum Generation framework, which optimizes the learning trajectory, significantly outperforming baselines while achieving superior reasoning performance with substantially fewer instruction samples.

Abstract

Enhancing mathematical reasoning in Large Language Models typically demands massive datasets, yet data efficiency remains a critical bottleneck. While Curriculum Learning attempts to structure this process, standard unidirectional approaches (simple-to-complex) suffer from inefficient sample utilization: they blindly escalate complexity even when foundational gaps persist, leading to wasted computation on unsolvable problems. To maximize the instructional value of every training sample, we introduce a novel Bidirectional Curriculum Generation framework. Unlike rigid trajectories, our multi-agent ecosystem mimics adaptive pedagogy to establish a closed feedback loop. It dynamically generates data by either complicating problems to challenge the model or, crucially, simplying them to repair specific reasoning failures. This mechanism ensures that the model consumes only the most effective data at any given stage. Grounded in the Optimal Pacing Theorem, our approach optimizes the learning trajectory, significantly outperforming baselines while achieving superior reasoning performance with substantially fewer instruction samples.

Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Reasoning

TL;DR

This work introduces a novel Bidirectional Curriculum Generation framework, which optimizes the learning trajectory, significantly outperforming baselines while achieving superior reasoning performance with substantially fewer instruction samples.

Abstract

Enhancing mathematical reasoning in Large Language Models typically demands massive datasets, yet data efficiency remains a critical bottleneck. While Curriculum Learning attempts to structure this process, standard unidirectional approaches (simple-to-complex) suffer from inefficient sample utilization: they blindly escalate complexity even when foundational gaps persist, leading to wasted computation on unsolvable problems. To maximize the instructional value of every training sample, we introduce a novel Bidirectional Curriculum Generation framework. Unlike rigid trajectories, our multi-agent ecosystem mimics adaptive pedagogy to establish a closed feedback loop. It dynamically generates data by either complicating problems to challenge the model or, crucially, simplying them to repair specific reasoning failures. This mechanism ensures that the model consumes only the most effective data at any given stage. Grounded in the Optimal Pacing Theorem, our approach optimizes the learning trajectory, significantly outperforming baselines while achieving superior reasoning performance with substantially fewer instruction samples.
Paper Structure (37 sections, 1 theorem, 42 equations, 4 figures, 14 tables)

This paper contains 37 sections, 1 theorem, 42 equations, 4 figures, 14 tables.

Key Result

Theorem 1

Let the model's capability level at time $t$ be $c_t$, and the sample difficulty be $d$. There exists an optimal difficulty interval $[c_t - \varepsilon, c_t + \varepsilon]$, such that when sampling within this interval, the expected gradient norm of the student model parameters $\theta$ is maximize

Figures (4)

  • Figure 1: Comparison of mathematical reasoning performance across varying data scales. The x-axis (log-scale) represents the number of training samples, and the y-axis shows the average performance across six benchmarks. The dashed lines represent the fitted scaling laws for our method (purple) and baselines (gray).
  • Figure 2: The pipeline of Bidirectional Curriculum with Multi-Agents for Data-Efficient Math Reasoning.
  • Figure 3: The Diversity distribution about generated datasets
  • Figure 4: The difficulty distribution of generated datasets

Theorems & Definitions (1)

  • Theorem 1: Optimal Pacing Theorem