QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation
Jiazheng Li, Hongzhou Lin, Hong Lu, Kaiyue Wen, Zaiwen Yang, Jiaxuan Gao, Yi Wu, Jingzhao Zhang
TL;DR
QuestA presents a data-centric augmentation that injects partial-solution hints into hard prompts during RL training to scaffold mathematical reasoning. By prepending the first $p$% of a solution, QuestA creates a learnable curriculum that improves sample efficiency and expands reasoning capacity for 1.5B-scale models, achieving new state-of-the-art results on AIME24, AIME25, and HMMT25. The method is plug-and-play with existing RL pipelines and demonstrates strong generalization, including test-time performance without hints. Theoretical analysis shows hints reshape the learnability landscape, reducing the sampling budget required to discover informative trajectories, and empirical results confirm improved pass@k curves and broader problem coverage. Overall, QuestA offers a practical path to enhance reasoning in LLMs through targeted, scalable data augmentation.
Abstract
Reinforcement learning (RL) has emerged as a central paradigm for training large language models (LLMs) in reasoning tasks. Yet recent studies question RL's ability to incentivize reasoning capacity beyond the base model. This raises a key challenge: how can RL be adapted to solve harder reasoning problems more effectively? To address this challenge, we propose a simple yet effective strategy via Question Augmentation: introduce partial solutions during training to reduce problem difficulty and provide more informative learning signals. Our method, QuestA, when applied during RL training on math reasoning tasks, not only improves pass@1 but also pass@k-particularly on problems where standard RL struggles to make progress. This enables continual improvement over strong open-source models such as DeepScaleR and OpenMath Nemotron, further enhancing their reasoning capabilities. We achieve new state-of-the-art results on math benchmarks using 1.5B-parameter models: 72.50% (+10.73%) on AIME24, 62.29% (+12.79%) on AIME25, and 41.67% (+10.11%) on HMMT25. Code, data and model are available at https://github.com/foreverlasting1202/QuestA.
