Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning
Tianhua Zhang, Jiaxin Ge, Hongyin Luo, Yung-Sung Chuang, Mingye Gao, Yuan Gong, Xixin Wu, Yoon Kim, Helen Meng, James Glass
TL;DR
Natural Language Embedded Programs (NLEP) unify language-based reasoning with executable program synthesis by prompting LLMs to emit fully runnable Python code that operates on natural-language-encoded knowledge; a Python interpreter runs the code and returns the result, making the reasoning trace explicit. The approach applies task-general prompts across math, symbolic reasoning, QA, instruction following, and text classification, achieving higher accuracy and improved prompt efficiency than standard chain-of-thought and PoT baselines on most tasks, with GPT-4 showing the strongest gains. NLEP also demonstrates interpretability since the generated programs lay out the reasoning steps executed by the interpreter, and a model-free variant shows potential for fast, interpretable classification. Limitations include variable gains on GSM-Hard and reduced performance for long-form natural language outputs, with future work aiming to extend the technique to longer outputs and more diverse tools while addressing alignment concerns.
Abstract
How can we perform computations over natural language representations to solve tasks that require symbolic and numeric reasoning? We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks. Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge. A Python interpreter then executes the generated code and prints the output. Despite using a task-general prompt, we find that this approach can improve upon strong baselines across a range of different tasks including math and symbolic reasoning, text classification, question answering, and instruction following. We found that the generated programs are interpretable since they outline the exact reasoning process followed by the program interpreter.
