Table of Contents
Fetching ...

Watch Your Steps: Observable and Modular Chains of Thought

Cassandra A. Cohen, William W. Cohen

TL;DR

This work identifies "non-local errors" as an unaddressed issue in CoT learning, and presents methods for verifying the modularity of steps in a CoT explanation, and enables new types of analysis.

Abstract

We propose a variant of chain of thought (CoT) prompting called Program Trace Prompting that makes explanations more observable while preserving the power, generality and flexibility of CoT. In our approach, few-shot CoT demonstrations are wrapped in a formal syntax based on Python, and each prompt: identifies and names steps; defines the input/output behavior of steps; and replaces CoT explanations of in-context examples with chains of these formalized steps on the same examples. Program Trace Prompting is applicable to many tasks, achieving strong results on the 23 diverse tasks in the BIG-Bench Hard benchmark. More importantly, by instrumenting explanations in this way, we enable new types of analysis. In particular, we identify "non-local errors" (which correspond to incorrectly learning the reasoning method illustrated in the demonstrations) as an unaddressed issue in CoT learning, and we present methods for verifying the modularity of steps in a CoT explanation.

Watch Your Steps: Observable and Modular Chains of Thought

TL;DR

This work identifies "non-local errors" as an unaddressed issue in CoT learning, and presents methods for verifying the modularity of steps in a CoT explanation, and enables new types of analysis.

Abstract

We propose a variant of chain of thought (CoT) prompting called Program Trace Prompting that makes explanations more observable while preserving the power, generality and flexibility of CoT. In our approach, few-shot CoT demonstrations are wrapped in a formal syntax based on Python, and each prompt: identifies and names steps; defines the input/output behavior of steps; and replaces CoT explanations of in-context examples with chains of these formalized steps on the same examples. Program Trace Prompting is applicable to many tasks, achieving strong results on the 23 diverse tasks in the BIG-Bench Hard benchmark. More importantly, by instrumenting explanations in this way, we enable new types of analysis. In particular, we identify "non-local errors" (which correspond to incorrectly learning the reasoning method illustrated in the demonstrations) as an unaddressed issue in CoT learning, and we present methods for verifying the modularity of steps in a CoT explanation.
Paper Structure (41 sections, 8 equations, 5 figures, 11 tables)

This paper contains 41 sections, 8 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: An illustration of Program Trace Prompting on a simplified version of a task from Big Bench Hard. (1) Instead of the original CoT prompt, we begin with a trace of a Python pseudo-program that implements a similar problem-solving strategy (see text for how this trace is generated). (2) This trace is then inserted into a set of "stubs", which document the subroutines used in the trace, yielding a skeleton of a program, which contains traces, type signatures, and documentation, but no code. (3) The skeleton program is inserted, along with a test input (4), into a prompt that instructs a LLM to predict the output of the program on the test input. (5) This prompt is then sent to an LLM, which produces (6) a predicted program trace, which contains the desired prediction for the test output--in this case, the word "yes".
  • Figure 2: Syntactic well-formedness of generated traces BBH tasks.
  • Figure 3: Top, part of a correct program trace. Middle, a local error: the first sport_for step returns an incorrect result (red), which causes an incorrect answer. Bottom, a non-local error: the consistent_sports call should have the bold-faced arguments 1 and 2 copied over from the first and second sport_for outputs, respectively, but the red bold-faced argument was not copies correctly.
  • Figure 4: Left, code for a mock for a simplified HB task. Right, a Program Trace prompt derived from the mock.
  • Figure 5: Prompts to generate a trace and continue a partial trace.