Table of Contents
Fetching ...

Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs

Jierui Li, Raymond Mooney

TL;DR

The paper tackles the difficulty of enabling LLMs to perform algorithmic reasoning by distilling from explanations rather than solving problems directly. It introduces a three-role framework (Explainer, Reasoner, Coder) where the Explainer generates editorial-style explanations for <problem, solution-program> pairs, the Reasoner learns from these explanations to produce problem-specific reasoning hat d, and the Coder implements code guided by the Reasoner's hints. Through a data-efficient setup with 8248 <p_i, d_i> triplets, the authors demonstrate that fine-tuning the Reasoner on explanations significantly improves solve rates over zero-shot baselines and direct code learning, especially when using Step-by-Step explanations and sampling diversity from the Reasoner. They validate the approach on CodeContests and a Codeforces-problem test set (CF Prob), showing improved performance and the ability to tackle harder problems, alongside an ablation that highlights the benefits of explanation-guided training and careful data-crafting. The work suggests a promising, data-efficient direction for extending reasoning capabilities beyond code to broader domains such as mathematics or formal logic, while also providing a dataset for further research.

Abstract

Distilling explicit chain-of-thought reasoning paths has emerged as an effective method for improving the reasoning abilities of large language models (LLMs) across various tasks. However, when tackling complex tasks that pose significant challenges for state-of-the-art models, this technique often struggles to produce effective chains of thought that lead to correct answers. In this work, we propose a novel approach to distill reasoning abilities from LLMs by leveraging their capacity to explain solutions. We apply our method to solving competitive-level programming challenges. More specifically, we employ an LLM to generate explanations for a set of <problem, solution-program> pairs, then use <problem, explanation> pairs to fine-tune a smaller language model, which we refer to as the Reasoner, to learn algorithmic reasoning that can generate "how-to-solve" hints for unseen problems. Our experiments demonstrate that learning from explanations enables the Reasoner to more effectively guide program implementation by a Coder, resulting in higher solve rates than strong chain-of-thought baselines on competitive-level programming problems. It also outperforms models that learn directly from <problem, solution-program> pairs. We curated an additional test set in the CodeContests format, which includes 246 more recent problems posted after the models' knowledge cutoff.

Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs

TL;DR

The paper tackles the difficulty of enabling LLMs to perform algorithmic reasoning by distilling from explanations rather than solving problems directly. It introduces a three-role framework (Explainer, Reasoner, Coder) where the Explainer generates editorial-style explanations for <problem, solution-program> pairs, the Reasoner learns from these explanations to produce problem-specific reasoning hat d, and the Coder implements code guided by the Reasoner's hints. Through a data-efficient setup with 8248 <p_i, d_i> triplets, the authors demonstrate that fine-tuning the Reasoner on explanations significantly improves solve rates over zero-shot baselines and direct code learning, especially when using Step-by-Step explanations and sampling diversity from the Reasoner. They validate the approach on CodeContests and a Codeforces-problem test set (CF Prob), showing improved performance and the ability to tackle harder problems, alongside an ablation that highlights the benefits of explanation-guided training and careful data-crafting. The work suggests a promising, data-efficient direction for extending reasoning capabilities beyond code to broader domains such as mathematics or formal logic, while also providing a dataset for further research.

Abstract

Distilling explicit chain-of-thought reasoning paths has emerged as an effective method for improving the reasoning abilities of large language models (LLMs) across various tasks. However, when tackling complex tasks that pose significant challenges for state-of-the-art models, this technique often struggles to produce effective chains of thought that lead to correct answers. In this work, we propose a novel approach to distill reasoning abilities from LLMs by leveraging their capacity to explain solutions. We apply our method to solving competitive-level programming challenges. More specifically, we employ an LLM to generate explanations for a set of <problem, solution-program> pairs, then use <problem, explanation> pairs to fine-tune a smaller language model, which we refer to as the Reasoner, to learn algorithmic reasoning that can generate "how-to-solve" hints for unseen problems. Our experiments demonstrate that learning from explanations enables the Reasoner to more effectively guide program implementation by a Coder, resulting in higher solve rates than strong chain-of-thought baselines on competitive-level programming problems. It also outperforms models that learn directly from <problem, solution-program> pairs. We curated an additional test set in the CodeContests format, which includes 246 more recent problems posted after the models' knowledge cutoff.
Paper Structure (28 sections, 1 equation, 7 figures, 6 tables)

This paper contains 28 sections, 1 equation, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Comparison between Solve-based and Explain-based chain-of-thoughts distilling. Top: Solve-based CoT distilling is likely to generate incorrect or inefficient solutions. Bottom: Explain-based CoT distilling can generate high-quality reasoning processes by explaining the oracle solution.
  • Figure 2: The framework of our approach. We use Explainer LLM to generate explanations given <problem, solution-program> pairs; then train Reasoner LLM to generate explanations given problem statements. During inference time, given the problem, the Reasoner can generate a reasoning process in the same format as solution explanations, which could be provided to the Coder as a hint to solve the problem better.
  • Figure 3: Final online judgment of programs that pass public tests. Accepted: correct; TLE: time limit exceeded; WA: wrong answer(s) on private tests; Other: memory limit exceeded, runtime error etc.
  • Figure 4: The problem difficulty statistics for problems solved with fine-tuned or zero-shot Reasoner when sampling 100 reasoning processes per problem.
  • Figure 5: Explainer Prompt Example: Example of the prompt we are using to generate Explanations from Explainer (GPT-4). The problem and solution is from Li_2022.
  • ...and 2 more figures