Explaining Competitive-Level Programming Solutions using LLMs
Jierui Li, Szymon Tworkowski, Yingying Wu, Raymond Mooney
TL;DR
This work investigates using large language models to generate structured, natural-language explanations for competitive-level programming <problem, solution> pairs, aiming to bridge problem understanding and implementation. It introduces a Specific-to-General explanation framework and an Explanation Instructed Solver that uses explanations as hints to improve program synthesis, evaluated on CodeContests with both human and automatic metrics. Results show that description-centric explanations, especially step-by-step descriptions, substantially boost solve rates, and GPT-4 consistently outperforms GPT-3.5 in capturing the key ideas and providing usable hints. The study highlights the potential of automatic explanations to generate silver-standard data and guide future reasoning-model improvements for algorithmic problem solving. It also discusses limitations and avenues for scaling to broader problem sets and models while raising considerations on evaluative subjectivity and safety.
Abstract
In this paper, we approach competitive-level programming problem-solving as a composite task of reasoning and code generation. We propose a novel method to automatically annotate natural language explanations to \textit{<problem, solution>} pairs. We show that despite poor performance in solving competitive-level programming problems, state-of-the-art LLMs exhibit a strong capacity in describing and explaining solutions. Our explanation generation methodology can generate a structured solution explanation for the problem containing descriptions and analysis. To evaluate the quality of the annotated explanations, we examine their effectiveness in two aspects: 1) satisfying the human programming expert who authored the oracle solution, and 2) aiding LLMs in solving problems more effectively. The experimental results on the CodeContests dataset demonstrate that while LLM GPT3.5's and GPT-4's abilities in describing the solution are comparable, GPT-4 shows a better understanding of the key idea behind the solution.
