Computational Blueprints: Generating Isomorphic Mathematics Problems with Large Language Models
Jeong-Hoon Kim, Jinwoo Nam, Geunsik Jo
TL;DR
This work defines Isomorphic Math Problem Generation (IMPG) to create structurally consistent problem variants for education and introduces Computational Blueprints for Isomorphic Twins (CBIT), a meta-generation framework that guarantees mathematical correctness while scaling cost-effectively. Through a progression from AIC-Batch to BIT to CBIT, the approach shifts from generating individual items to producing verifiable generator programs, enabling deterministic correctness and reproducibility. Empirical results show CBIT achieves higher generation accuracy with dramatically lower costs, and a large-scale deployment demonstrates reduced error rates compared with expert-authored items across thousands of learners. The work provides a benchmark, verification toolkit, and a practical pathway for deploying large banks of educational content at scale, highlighting the potential for automated, reliable practice generation in mathematics education.
Abstract
Personalized mathematics education is growing rapidly, creating a strong demand for large sets of similar practice problems. Yet existing studies on mathematics problem generation have focused on data augmentation for training neural language models rather than on direct educational deployment. To bridge this gap, we define a new task, Isomorphic Math Problem Generation (IMPG), designed to produce structurally consistent variants of source problems. Subsequently, we explored LLM-based frameworks for automatic IMPG through successive refinements, and established Computational Blueprints for Isomorphic Twins (CBIT). With meta-level generation and template-based selective variation, CBIT achieves high mathematical correctness and structural consistency while reducing the cost of generation. Empirical results across refinements demonstrate that CBIT is superior on generation accuracy and cost-effectiveness at scale. Most importantly, CBIT-generated problems exhibited an error rate 17.8% lower than expert-authored items, with deployment to 6,732 learners on a commercial education platform yielding 186,870 interactions.
