Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs
Jierui Li, Raymond Mooney
TL;DR
The paper tackles the difficulty of enabling LLMs to perform algorithmic reasoning by distilling from explanations rather than solving problems directly. It introduces a three-role framework (Explainer, Reasoner, Coder) where the Explainer generates editorial-style explanations for <problem, solution-program> pairs, the Reasoner learns from these explanations to produce problem-specific reasoning hat d, and the Coder implements code guided by the Reasoner's hints. Through a data-efficient setup with 8248 <p_i, d_i> triplets, the authors demonstrate that fine-tuning the Reasoner on explanations significantly improves solve rates over zero-shot baselines and direct code learning, especially when using Step-by-Step explanations and sampling diversity from the Reasoner. They validate the approach on CodeContests and a Codeforces-problem test set (CF Prob), showing improved performance and the ability to tackle harder problems, alongside an ablation that highlights the benefits of explanation-guided training and careful data-crafting. The work suggests a promising, data-efficient direction for extending reasoning capabilities beyond code to broader domains such as mathematics or formal logic, while also providing a dataset for further research.
Abstract
Distilling explicit chain-of-thought reasoning paths has emerged as an effective method for improving the reasoning abilities of large language models (LLMs) across various tasks. However, when tackling complex tasks that pose significant challenges for state-of-the-art models, this technique often struggles to produce effective chains of thought that lead to correct answers. In this work, we propose a novel approach to distill reasoning abilities from LLMs by leveraging their capacity to explain solutions. We apply our method to solving competitive-level programming challenges. More specifically, we employ an LLM to generate explanations for a set of <problem, solution-program> pairs, then use <problem, explanation> pairs to fine-tune a smaller language model, which we refer to as the Reasoner, to learn algorithmic reasoning that can generate "how-to-solve" hints for unseen problems. Our experiments demonstrate that learning from explanations enables the Reasoner to more effectively guide program implementation by a Coder, resulting in higher solve rates than strong chain-of-thought baselines on competitive-level programming problems. It also outperforms models that learn directly from <problem, solution-program> pairs. We curated an additional test set in the CodeContests format, which includes 246 more recent problems posted after the models' knowledge cutoff.
