Table of Contents
Fetching ...

Explaining Competitive-Level Programming Solutions using LLMs

Jierui Li, Szymon Tworkowski, Yingying Wu, Raymond Mooney

TL;DR

This work investigates using large language models to generate structured, natural-language explanations for competitive-level programming <problem, solution> pairs, aiming to bridge problem understanding and implementation. It introduces a Specific-to-General explanation framework and an Explanation Instructed Solver that uses explanations as hints to improve program synthesis, evaluated on CodeContests with both human and automatic metrics. Results show that description-centric explanations, especially step-by-step descriptions, substantially boost solve rates, and GPT-4 consistently outperforms GPT-3.5 in capturing the key ideas and providing usable hints. The study highlights the potential of automatic explanations to generate silver-standard data and guide future reasoning-model improvements for algorithmic problem solving. It also discusses limitations and avenues for scaling to broader problem sets and models while raising considerations on evaluative subjectivity and safety.

Abstract

In this paper, we approach competitive-level programming problem-solving as a composite task of reasoning and code generation. We propose a novel method to automatically annotate natural language explanations to \textit{<problem, solution>} pairs. We show that despite poor performance in solving competitive-level programming problems, state-of-the-art LLMs exhibit a strong capacity in describing and explaining solutions. Our explanation generation methodology can generate a structured solution explanation for the problem containing descriptions and analysis. To evaluate the quality of the annotated explanations, we examine their effectiveness in two aspects: 1) satisfying the human programming expert who authored the oracle solution, and 2) aiding LLMs in solving problems more effectively. The experimental results on the CodeContests dataset demonstrate that while LLM GPT3.5's and GPT-4's abilities in describing the solution are comparable, GPT-4 shows a better understanding of the key idea behind the solution.

Explaining Competitive-Level Programming Solutions using LLMs

TL;DR

This work investigates using large language models to generate structured, natural-language explanations for competitive-level programming <problem, solution> pairs, aiming to bridge problem understanding and implementation. It introduces a Specific-to-General explanation framework and an Explanation Instructed Solver that uses explanations as hints to improve program synthesis, evaluated on CodeContests with both human and automatic metrics. Results show that description-centric explanations, especially step-by-step descriptions, substantially boost solve rates, and GPT-4 consistently outperforms GPT-3.5 in capturing the key ideas and providing usable hints. The study highlights the potential of automatic explanations to generate silver-standard data and guide future reasoning-model improvements for algorithmic problem solving. It also discusses limitations and avenues for scaling to broader problem sets and models while raising considerations on evaluative subjectivity and safety.

Abstract

In this paper, we approach competitive-level programming problem-solving as a composite task of reasoning and code generation. We propose a novel method to automatically annotate natural language explanations to \textit{<problem, solution>} pairs. We show that despite poor performance in solving competitive-level programming problems, state-of-the-art LLMs exhibit a strong capacity in describing and explaining solutions. Our explanation generation methodology can generate a structured solution explanation for the problem containing descriptions and analysis. To evaluate the quality of the annotated explanations, we examine their effectiveness in two aspects: 1) satisfying the human programming expert who authored the oracle solution, and 2) aiding LLMs in solving problems more effectively. The experimental results on the CodeContests dataset demonstrate that while LLM GPT3.5's and GPT-4's abilities in describing the solution are comparable, GPT-4 shows a better understanding of the key idea behind the solution.
Paper Structure (28 sections, 4 figures, 8 tables)

This paper contains 28 sections, 4 figures, 8 tables.

Figures (4)

  • Figure 1: The explanation generation and evaluation framework and corresponding prompts (Top). An example of the full explain prompt (Bottom Left) and model's output is in Appendix Table \ref{['tab:case_study']}. The blue points are descriptions while the grey points are analysis. We give the explanation based on the oracle solution to the instructed solver as a hint (Bottom Right) to evaluate the quality of the generated explanation.
  • Figure 2: The Baseline Solver Prompt and General-to-Specific (G2S) Prompt which asks LLMs to follow the reasoning steps till it reaches the state of implementation.
  • Figure 3: Human Likert scores ($-2$: very poor to $2$: excellent) evaluating various aspects of the explanations.
  • Figure 4: The aiding effects of 3 levels of Solution Description over different difficulty ratings. The difference in color shows the gain in solve@10.