Leveraging Print Debugging to Improve Code Generation in Large Language Models
Xueyu Hu, Kun Kuang, Jiankai Sun, Hongxia Yang, Fei Wu
TL;DR
The paper tackles the challenge of generating correct code for problems with complex data structures by introducing a print-debugging in-context learning loop for LLMs. It guides models to insert and analyze print-based logs to identify and fix bugs, leveraging test cases and execution traces as interpretable feedback. Evaluated on a LeetCode dataset with GPT-4, the approach significantly improves easy and medium problem performance over rubber duck debugging, while hard problems remain resistant to improvement. Ablation and case-study analyses emphasize that combining test-case explanations with execution logs is key to effective debugging, suggesting a path toward more robust, log-informed code generation in LLMs.
Abstract
Large language models (LLMs) have made significant progress in code generation tasks, but their performance in tackling programming problems with complex data structures and algorithms remains suboptimal. To address this issue, we propose an in-context learning approach that guides LLMs to debug by using a "print debugging" method, which involves inserting print statements to trace and analysing logs for fixing the bug. We collect a Leetcode problem dataset and evaluate our method using the Leetcode online judging system. Experiments with GPT-4 demonstrate the effectiveness of our approach, outperforming rubber duck debugging in easy and medium-level Leetcode problems by 1.5% and 17.9%.
