Training and Evaluating Language Models with Template-based Data Generation
Yifan Zhang
TL;DR
The paper tackles the data scarcity hindering mathematical reasoning in large language models by introducing Template-based Data Generation (TDG), which uses GPT-4 to author meta-templates and automatically instantiate verifiable problems and solutions. It proposes a robust pipeline where problems are generated, programmatically verified through sandboxed code execution, and rewarded via Reinforcement Learning with Verifiable Rewards (RLVR). The authors present TemplateGSM, a massive dataset of over 7 million math problems with code-based and natural-language solutions, publicly available for training, evaluation, and alignment research. This approach provides a scalable, objective pathway to improve reasoning capabilities and offers a generalizable blueprint for applying template-based data generation and verification to other reasoning-intensive domains.
Abstract
The rapid advancement of large language models (LLMs) such as GPT-3, PaLM, and Llama has significantly transformed natural language processing, showcasing remarkable capabilities in understanding and generating language. However, a fundamental bottleneck persists: these models often struggle with tasks requiring complex, multi-step reasoning, particularly in mathematical problem-solving. This deficiency stems from the critical scarcity of large-scale, high-quality, domain-specific datasets necessary for cultivating sophisticated reasoning abilities. To overcome this challenge, we introduce Template-based Data Generation (TDG), a novel and scalable paradigm that harnesses frontier LLMs (GPT-4) to automatically generate parameterized meta-templates, which in turn synthesize a virtually infinite stream of high-quality problems and solutions. Using this paradigm, we create TemplateMath Part I: TemplateGSM, a foundational dataset of over 7 million synthetically generated grade school math problems. Each problem is accompanied by a programmatically verifiable solution, offering an unprecedented level of quality at scale. This resource not only resolves the data scarcity issue for supervised fine-tuning but also provides a robust mechanism for model alignment through Reinforcement Learning with Verifiable Rewards (RLVR). Our approach elevates data augmentation by leveraging GPT-4 to generate meta-templates, ensuring diverse and complex problem structures. By providing a scalable solution to the data and verification bottleneck, TDG and TemplateGSM pave the way for a new generation of LLMs with powerful, reliable reasoning skills. Project Page: https://github.com/iiis-ai/TemplateMath
