SBSC: Step-By-Step Coding for Improving Mathematical Olympiad Performance
Kunal Singh, Ankan Biswas, Sayandeep Bhowmick, Pradeep Moturi, Siva Kishore Gollapalli
TL;DR
SBSC introduces Step-By-Step Coding, a multi-turn reasoning framework that decomposes Olympiad-level math problems into sub-tasks, each solved by generating and executing a program. At every turn, the model leverages previous execution outputs to define the next sub-task and corresponding code, enabling granular self-correction and robust handling of constraints. Across AMC/AIME/MathOdyssey/OlympiadBench, SBSC with greedy decoding outperforms state-of-the-art approaches like COT, PAL, and TIR-ToRA and shows favorable comparisons to self-consistency baselines. Ablation studies demonstrate the importance of exemplar design, code comments, and debugging ability, with performance correlating to the coding capabilities of the underlying LLMs. Overall, SBSC advances mathematical reasoning by combining stepwise task decomposition with executable code, offering improved accuracy and generalization for complex problems.
Abstract
We propose Step-by-Step Coding (SBSC): a multi-turn math reasoning framework that enables Large Language Models (LLMs) to generate sequence of programs for solving Olympiad level math problems. At each step/turn, by leveraging the code execution outputs and programs of previous steps, the model generates the next sub-task and the corresponding program to solve it. This way, SBSC, sequentially navigates to reach the final answer. SBSC allows more granular, flexible and precise approach to problem-solving compared to existing methods. Extensive experiments highlight the effectiveness of SBSC in tackling competition and Olympiad-level math problems. For Claude-3.5-Sonnet, we observe SBSC (greedy decoding) surpasses existing state-of-the-art (SOTA) program generation based reasoning strategies by absolute 10.7% on AMC12, 8% on AIME and 12.6% on MathOdyssey. Given SBSC is multi-turn in nature, we also benchmark SBSC's greedy decoding against self-consistency decoding results of existing SOTA math reasoning strategies and observe performance gain by absolute 6.2% on AMC, 6.7% on AIME and 7.4% on MathOdyssey.
