ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math Questions

Phuoc Pham Van Long; Duc Anh Vu; Nhat M. Hoang; Xuan Long Do; Anh Tuan Luu

ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math Questions

Phuoc Pham Van Long, Duc Anh Vu, Nhat M. Hoang, Xuan Long Do, Anh Tuan Luu

TL;DR

This work systematically evaluates ChatGPT as a pre-university math question generator under context-aware and context-unaware settings. It introduces TopicMath, an expert-curated curriculum collection with $121$ topics and $428$ lessons, and PRE-UMATH, a dataset of $16{,}000$ QA pairs assembled via prompting and expert verification. Across benchmarks SVAMP, GSM8K, and MATH, ChatGPT shows strong grammaticality and context relevance but limited ability to generate consistently difficult, multi-step questions, especially in the context-aware, answer-aware regime where fine-tuned baselines excel. In the context-unaware setting, TopicMath enables broad topic coverage but reveals challenges in topic alignment and cross-domain consistency, highlighting the need for careful prompting and potential hybrid approaches. The findings provide practical guidance for educators and researchers on leveraging LLMs like ChatGPT for math question generation and curriculum design, while outlining limitations around multi-step reasoning and object-relationship understanding.$

Abstract

Mathematical questioning is crucial for assessing students problem-solving skills. Since manually creating such questions requires substantial effort, automatic methods have been explored. Existing state-of-the-art models rely on fine-tuning strategies and struggle to generate questions that heavily involve multiple steps of logical and arithmetic reasoning. Meanwhile, large language models(LLMs) such as ChatGPT have excelled in many NLP tasks involving logical and arithmetic reasoning. Nonetheless, their applications in generating educational questions are underutilized, especially in the field of mathematics. To bridge this gap, we take the first step to conduct an in-depth analysis of ChatGPT in generating pre-university math questions. Our analysis is categorized into two main settings: context-aware and context-unaware. In the context-aware setting, we evaluate ChatGPT on existing math question-answering benchmarks covering elementary, secondary, and ternary classes. In the context-unaware setting, we evaluate ChatGPT in generating math questions for each lesson from pre-university math curriculums that we crawl. Our crawling results in TopicMath, a comprehensive and novel collection of pre-university math curriculums collected from 121 math topics and 428 lessons from elementary, secondary, and tertiary classes. Through this analysis, we aim to provide insight into the potential of ChatGPT as a math questioner.

ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math Questions

TL;DR

topics and

lessons, and PRE-UMATH, a dataset of

QA pairs assembled via prompting and expert verification. Across benchmarks SVAMP, GSM8K, and MATH, ChatGPT shows strong grammaticality and context relevance but limited ability to generate consistently difficult, multi-step questions, especially in the context-aware, answer-aware regime where fine-tuned baselines excel. In the context-unaware setting, TopicMath enables broad topic coverage but reveals challenges in topic alignment and cross-domain consistency, highlighting the need for careful prompting and potential hybrid approaches. The findings provide practical guidance for educators and researchers on leveraging LLMs like ChatGPT for math question generation and curriculum design, while outlining limitations around multi-step reasoning and object-relationship understanding.$

Abstract

Paper Structure (44 sections, 8 tables, 1 algorithm)

This paper contains 44 sections, 8 tables, 1 algorithm.

Introduction
Related Work
Large Language Models & Prompting
Pre-university Math Problems Generation
Problem Formulation
$\bullet$ Context-aware
$\bullet$ Context-unaware
Context-aware Methodology
Fine-tuning Baselines
Prompting ChatGPT
Context-unaware Methodology
TopicMath Creation
(1) Curriculum Collection
(2) Create Examples' Answers
(3) Curriculum Expert Verification
...and 29 more sections

ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math Questions

TL;DR

Abstract

ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math Questions

Authors

TL;DR

Abstract

Table of Contents