Pedagogy-R1: Pedagogically-Aligned Reasoning Model with Balanced Educational Benchmark
Unggi Lee, Jaeyong Lee, Jiyeong Bae, Yeil Jeong, Junbo Koh, Gyeonggeon Lee, Gunho Lee, Taekyung Ahn, Hyeoncheol Kim
TL;DR
Pedagogy-R1 addresses the gap between strong reasoning in LRMs and the need for pedagogically coherent teaching behavior. It introduces a distillation-based training pipeline, the Well-balanced Educational Benchmark (WBEB) across SK, PK, KT, AES, and DM, and the Chain-of-Pedagogy (CoP) prompting strategy to elicit teacher-like reasoning. Empirical results show Pedagogy-R1 achieves more balanced and educationally aligned performance than standard baselines, with notable gains in pedagogical knowledge, knowledge tracing, and instructional decision-making, while preserving reasonable subject knowledge. The work offers practical implications for deploying LRMs in classrooms and educational platforms, supported by open datasets and a mixed-method evaluation that combines quantitative metrics with grounded theory–based qualitative analysis.
Abstract
Recent advances in large reasoning models (LRMs) show strong performance in structured domains such as mathematics and programming; however, they often lack pedagogical coherence and realistic teaching behaviors. To bridge this gap, we introduce Pedagogy-R1, a framework that adapts LRMs for classroom use through three innovations: (1) a distillation-based pipeline that filters and refines model outputs for instruction-tuning, (2) the Well-balanced Educational Benchmark (WBEB), which evaluates performance across subject knowledge, pedagogical knowledge, tracing, essay scoring, and teacher decision-making, and (3) a Chain-of-Pedagogy (CoP) prompting strategy for generating and eliciting teacher-style reasoning. Our mixed-method evaluation combines quantitative metrics with qualitative analysis, providing the first systematic assessment of LRMs' pedagogical strengths and limitations.
