Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models
Yi Hu, Haotong Yang, Zhouchen Lin, Muhan Zhang
TL;DR
This work introduces Code Prompting, a neural-symbolic prompting approach that elicits executable Python code from LLMs as an intermediate reasoning form. By leveraging a two-stage pipeline—code generation followed by code execution or LLM-driven reasoning about the code—the method achieves substantial improvements over traditional chain-of-thought prompting on symbolic and arithmetic benchmarks. Through extensive ablations, error analyses, and enhancements like self-debugging and code annotations, the study shows that code-based representations reduce ambiguity and improve robustness, while ensembles combining CoT and code prompting further boost performance (e.g., GSM8K 87.95%). The findings highlight the value of integrating symbolic, executable reasoning into LLM workflows and point to practical directions for hybrid prompting strategies and tool-assisted reasoning.
Abstract
Large language models (LLMs) have scaled up to unlock a wide range of complex reasoning tasks with the aid of various prompting methods. However, current prompting methods generate natural language intermediate steps to help reasoning, which can cause imperfect task reduction and confusion. To mitigate such limitations, we explore code prompting, a neural symbolic prompting method with both zero-shot and few-shot versions which triggers code as intermediate steps. We conduct experiments on 7 widely-used benchmarks involving symbolic reasoning and arithmetic reasoning. Code prompting generally outperforms chain-of-thought (CoT) prompting. To further understand the performance and limitations of code prompting, we perform extensive ablation studies and error analyses, and identify several exclusive advantages of using symbolic promptings compared to natural language. We also consider the ensemble of code prompting and CoT prompting to combine the strengths of both. Finally, we show through experiments how code annotations and their locations affect code prompting.
