Automate Knowledge Concept Tagging on Math Questions with LLMs
Hang Li, Tianlong Xu, Jiliang Tang, Qingsong Wen
TL;DR
This paper investigates automating knowledge concept tagging for math questions with LLMs to scale intelligent educational systems. It frames tagging as a binary alignment task and develops zero-shot and few-shot prompting strategies, including self-reflection, to link questions to knowledge concepts without external inputs. A new MathKnowCT dataset (12 concepts; >80 questions per concept; expert-annotated) supports extensive evaluation across multiple instruct-tuned LLMs, with GPT-4 achieving high accuracy in zero-shot settings and notable variation across models. The study analyzes prompt design, knowledge interpretation, and self-reflection as key factors, demonstrating practical potential for automated, high-precision concept tagging in math education.
Abstract
Knowledge concept tagging for questions plays a crucial role in contemporary intelligent educational applications, including learning progress diagnosis, practice question recommendations, and course content organization. Traditionally, these annotations have been conducted manually with help from pedagogical experts, as the task requires not only a strong semantic understanding of both question stems and knowledge definitions but also deep insights into connecting question-solving logic with corresponding knowledge concepts. In this paper, we explore automating the tagging task using Large Language Models (LLMs), in response to the inability of prior manual methods to meet the rapidly growing demand for concept tagging in questions posed by advanced educational applications. Moreover, the zero/few-shot learning capability of LLMs makes them well-suited for application in educational scenarios, which often face challenges in collecting large-scale, expertise-annotated datasets. By conducting extensive experiments with a variety of representative LLMs, we demonstrate that LLMs are a promising tool for concept tagging in math questions. Furthermore, through case studies examining the results from different LLMs, we draw some empirical conclusions about the key factors for success in applying LLMs to the automatic concept tagging task.
