Iterative LLM-Based Generation and Refinement of Distracting Conditions in Math Word Problems
Kaiqi Yang, Hang Li, Yucheng Chu, Zitao Liu, Mi Tian, Hui Liu
TL;DR
This paper addresses the challenge of distracting or irrelevant information in math word problems (MWPs) and how it impacts large language models (LLMs). It introduces IGC-MWP, an LLM-driven iterative framework that generates distracting conditions while ensuring the original solution remains unchanged, thereby reducing annotation effort and preserving data quality. The approach relies on a structured five-step prompt set and an automatic rejection mechanism to progressively refine problems through generation, quantitative/difficulty checks, and desirable/undesirable trait assessments. Experiments on GSM-8K show that IGC-MWP yields higher-quality distractors, resulting in the largest performance drops among baselines and demonstrating improved realism and cognitive difficulty. The framework offers a scalable, deployable method for benchmarking and improving LLM reasoning in MWPs, with future work focusing on quantitative quality metrics and contrastive tuning to further boost mathematical reasoning capabilities.
Abstract
Mathematical reasoning serves as a crucial testbed for the intelligence of large language models (LLMs), and math word problems (MWPs) are a popular type of math problems. Most MWP datasets consist of problems containing only the necessary information, while problems with distracting and excessive conditions are often overlooked. Prior works have tested popular LLMs and found a dramatic performance drop in the presence of distracting conditions. However, datasets of MWPs with distracting conditions are limited, and most suffer from lower levels of difficulty and out-of-context expressions. This makes distracting conditions easy to identify and exclude, thus reducing the credibility of benchmarking on them. Moreover, when adding distracting conditions, the reasoning and answers may also change, requiring intensive labor to check and write the solutions. To address these issues, we design an iterative framework to generate distracting conditions using LLMs. We develop a set of prompts to revise MWPs from different perspectives and cognitive levels, encouraging the generation of distracting conditions as well as suggestions for further revision. Another advantage is the shared solutions between original and revised problems: we explicitly guide the LLMs to generate distracting conditions that do not alter the original solutions, thus avoiding the need to generate new solutions. This framework is efficient and easy to deploy, reducing the overhead of generating MWPs with distracting conditions while maintaining data quality.
