Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Hyungjoo Chae; Yeonghyeon Kim; Seungone Kim; Kai Tzu-iunn Ong; Beong-woo Kwak; Moohyeon Kim; Seonghwan Kim; Taeyoon Kwon; Jiwan Chung; Youngjae Yu; Jinyoung Yeo

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Hyungjoo Chae, Yeonghyeon Kim, Seungone Kim, Kai Tzu-iunn Ong, Beong-woo Kwak, Moohyeon Kim, Seonghwan Kim, Taeyoon Kwon, Jiwan Chung, Youngjae Yu, Jinyoung Yeo

TL;DR

The paper tackles the challenge of algorithmic reasoning in large language models by introducing Think-and-Execute, a two-stage framework that first uncovers task-level reasoning logic and encodes it as pseudocode (Think), then tailors and executes this logic for each instance (Execute). By separating planning from execution and using a pseudocode representation, the approach enables reuse of logic across multiple instances and improves reasoning performance across seven Big-Bench Hard tasks, outperforming direct prompting, zero-shot CoT, and instance-specific Python code. Empirical results show the method transfers to small LMs like CodeLlama and that pseudocode generally describes the task logic better than natural language plans. Ablation and analysis reveal the importance of semantic content, pre-analysis, and code knowledge, underscoring the method’s reliance on task-level structure and code familiarity to enhance algorithmic reasoning.

Abstract

Algorithmic reasoning refers to the ability to understand the complex patterns behind the problem and decompose them into a sequence of reasoning steps towards the solution. Such nature of algorithmic reasoning makes it a challenge for large language models (LLMs), even though they have demonstrated promising performance in other reasoning tasks. Within this context, some recent studies use programming languages (e.g., Python) to express the necessary logic for solving a given instance/question (e.g., Program-of-Thought) as inspired by their strict and precise syntaxes. However, it is non-trivial to write an executable code that expresses the correct logic on the fly within a single inference call. Also, the code generated specifically for an instance cannot be reused for others, even if they are from the same task and might require identical logic to solve. This paper presents Think-and-Execute, a novel framework that decomposes the reasoning process of language models into two steps. (1) In Think, we discover a task-level logic that is shared across all instances for solving a given task and then express the logic with pseudocode; (2) In Execute, we further tailor the generated pseudocode to each instance and simulate the execution of the code. With extensive experiments on seven algorithmic reasoning tasks, we demonstrate the effectiveness of Think-and-Execute. Our approach better improves LMs' reasoning compared to several strong baselines performing instance-specific reasoning (e.g., CoT and PoT), suggesting the helpfulness of discovering task-level logic. Also, we show that compared to natural language, pseudocode can better guide the reasoning of LMs, even though they are trained to follow natural language instructions.

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

TL;DR

Abstract

Paper Structure (73 sections, 4 figures, 13 tables)

This paper contains 73 sections, 4 figures, 13 tables.

Introduction
Think-and-Execute
Think: Describing the Underlying Logic of a Task in a Pseudocode Format
Step 1: Constructing a meta prompt.
Step 2: Analyzing the target task.
Step 3: Generating a pseudocode prompt based on the analysis.
Execute: Simulating the Execution of Pseudocode Prompt for an Instance
Experimental Setup
Datasets
Baselines
Models
Results
Think-and-Execute Improves Algorithmic Reasoning
Task-level Pseudocode Prompts Benefits a Wider Range of Algorithmic Reasoning Tasks than Instance-specific Python Code
The Logic Discovered by an LLM can be Transferred to SLMs
...and 58 more sections

Figures (4)

Figure 1: An illustration of Think-and-Execute, compared with Zero-shot Chain-of-Thought kojima2022large and Program-of-Thoughts chen2023program.
Figure 2: An overview of Think-and-Execute. In Think (Top), an LLM analyzes the given task provided in the meta prompt and generates a pseudocode prompt that describes the necessary logic for solving the task. Then, in Execute (Bottom), the LLM conducts reasoning for each instance by simulating the execution of the pseudocode prompt.
Figure 3: Ablation study of the components of pseudocode prompt using GPT-3.5-Turbo.
Figure 4: Analysis on the effect of code pre-training on the reasoning capability in applying Think-and-Execute. Without pre-training on code corpora the accuracies drop notably.

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

TL;DR

Abstract

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)