What makes math problems hard for reinforcement learning: a case study
Ali Shehper, Anibal M. Medina-Mardones, Lucas Fagan, Bartłomiej Lewandowski, Angus Gruen, Yang Qiu, Piotr Kucharski, Zhenghan Wang, Sergei Gukov
TL;DR
This work investigates why certain math problems are exceptionally hard for reinforcement learning by focusing on the Andrews–Curtis conjecture as a case study. It develops a multi-pronged approach: classical search (BFS, greedy), reinforcement learning (PPO with varied horizons), and language modeling (decoder-only transformers) to study the hardness distribution across balanced presentations, notably in Miller–Schupp and Akbulut–Kirby series. A central contribution is a principled global hardness measure based on persistent homology, plus analyses of local graph features that predict solvability; the authors show both practical algorithmic advances (supermoves, adaptive action spaces) and new mathematical results (length reductions for AK$(n)$ and AC-trivializations in MS subfamilies). They also connect stability concepts to knot theory, demonstrating that stably AC-trivial presentations arise naturally from unknot diagrams and Wirtinger presentations, while acknowledging misprints and caveats in related literature. Overall, the paper offers a blueprint for learning-to-learn in hard mathematical search problems and provides concrete results that bridge deep mathematics with modern AI methodology.
Abstract
Using a long-standing conjecture from combinatorial group theory, we explore, from multiple perspectives, the challenges of finding rare instances carrying disproportionately high rewards. Based on lessons learned in the context defined by the Andrews-Curtis conjecture, we propose algorithmic enhancements and a topological hardness measure with implications for a broad class of search problems. As part of our study, we also address several open mathematical questions. Notably, we demonstrate the length reducibility of all but two presentations in the Akbulut-Kirby series (1981), and resolve various potential counterexamples in the Miller-Schupp series (1991), including three infinite subfamilies.
