Let GPT be a Math Tutor: Teaching Math Word Problem Solvers with Customized Exercise Generation
Zhenwen Liang, Wenhao Yu, Tanmay Rajpurohit, Peter Clark, Xiangliang Zhang, Ashwin Kaylan
TL;DR
This work introduces CEMAL, a knowledge-tracing–driven framework that distills math word problem solving from large language models into smaller, efficient student solvers by generating customized exercises. It integrates an iterative training workflow, an LLM-guided exercise book, and targeted data augmentation to address the learner's weaknesses, achieving state-of-the-art results on MAWPS and ASDiv-a with far fewer parameters and robust performance on SVAMP under ID and OOD settings. Key findings include the superiority of targeted generation over random strategies, the value of a sizable exercise book for robust validation, and the benefits of progressive augmentation over one-shot data expansion. The approach offers practical implications for educational AI by aligning distillation with knowledge tracing and personalized learning, though it requires careful prompt design and quality control of generated content.
Abstract
In this paper, we present a novel approach for distilling math word problem solving capabilities from large language models (LLMs) into smaller, more efficient student models. Our approach is designed to consider the student model's weaknesses and foster a tailored learning experience by generating targeted exercises aligned with educational science principles, such as knowledge tracing and personalized learning. Concretely, we let GPT-3 be a math tutor and run two steps iteratively: 1) assessing the student model's current learning status on a GPT-generated exercise book, and 2) improving the student model by training it with tailored exercise samples generated by GPT-3. Experimental results reveal that our approach outperforms LLMs (e.g., GPT-3 and PaLM) in accuracy across three distinct benchmarks while employing significantly fewer parameters. Furthermore, we provide a comprehensive analysis of the various components within our methodology to substantiate their efficacy.
