Table of Contents
Fetching ...

Self-Harmonized Chain of Thought

Ziqi Jin, Wei Lu

TL;DR

This work tackles the challenges of chain-of-thought prompting, notably the instability of zero-shot reasoning and the labor-intensive need for human-crafted demonstrations in few-shot CoT. It introduces ECHO, a self-harmonized CoT approach that clusters questions, samples representative demonstrations, and iteratively unifies their rationales to produce a coherent reasoning pattern, drawing on cognitive load theory. Across arithmetic, commonsense, and symbolic reasoning, ECHO outperforms Auto-CoT and rivals Few-shot-CoT, with ablations confirming the benefits of diversity reduction and demonstration unification. The results suggest that unifying diverse reasoning patterns yields more robust automated reasoning in large language models, albeit with higher inference cost and some domain-dependent limitations.

Abstract

Chain-of-thought (CoT) prompting has demonstrated the capacity of large language models to perform complex reasoning through intermediate steps. While effective, current CoT methods face challenges: Zero-shot-CoT can lead to reasoning errors, and Few-shot-CoT requires labor-intensive manual demonstrations. Auto-CoT attempts to address these issues by automatically generating diverse demonstrations, but this diversity can lead to inconsistent reasoning patterns. We propose ECHO (Self-Harmonized Chain of Thought), a novel method that unifies diverse solution paths into a consistent and effective reasoning pattern. ECHO employs an iterative process to refine and harmonize automatically generated demonstrations, mitigating the limitations of existing approaches. Our comprehensive experiments across arithmetic, commonsense, and symbolic reasoning tasks demonstrate that ECHO outperforms Auto-CoT by an average of 2.8%. These findings suggest that ECHO represents a significant step towards more robust and generalizable automated reasoning in large language models.

Self-Harmonized Chain of Thought

TL;DR

This work tackles the challenges of chain-of-thought prompting, notably the instability of zero-shot reasoning and the labor-intensive need for human-crafted demonstrations in few-shot CoT. It introduces ECHO, a self-harmonized CoT approach that clusters questions, samples representative demonstrations, and iteratively unifies their rationales to produce a coherent reasoning pattern, drawing on cognitive load theory. Across arithmetic, commonsense, and symbolic reasoning, ECHO outperforms Auto-CoT and rivals Few-shot-CoT, with ablations confirming the benefits of diversity reduction and demonstration unification. The results suggest that unifying diverse reasoning patterns yields more robust automated reasoning in large language models, albeit with higher inference cost and some domain-dependent limitations.

Abstract

Chain-of-thought (CoT) prompting has demonstrated the capacity of large language models to perform complex reasoning through intermediate steps. While effective, current CoT methods face challenges: Zero-shot-CoT can lead to reasoning errors, and Few-shot-CoT requires labor-intensive manual demonstrations. Auto-CoT attempts to address these issues by automatically generating diverse demonstrations, but this diversity can lead to inconsistent reasoning patterns. We propose ECHO (Self-Harmonized Chain of Thought), a novel method that unifies diverse solution paths into a consistent and effective reasoning pattern. ECHO employs an iterative process to refine and harmonize automatically generated demonstrations, mitigating the limitations of existing approaches. Our comprehensive experiments across arithmetic, commonsense, and symbolic reasoning tasks demonstrate that ECHO outperforms Auto-CoT by an average of 2.8%. These findings suggest that ECHO represents a significant step towards more robust and generalizable automated reasoning in large language models.
Paper Structure (29 sections, 3 equations, 5 figures, 20 tables, 1 algorithm)

This paper contains 29 sections, 3 equations, 5 figures, 20 tables, 1 algorithm.

Figures (5)

  • Figure 1: A comparison between ECHO and other CoT baselines. "Zero-CoT" is short for Zero-shot-CoT and "Few-CoT" is short for Few-shot-CoT. The demonstrations generated by Auto-CoT and ECHO will be applied as few-shot examples during inference.
  • Figure 2: Overview of our ECHO method. In the demonstration unification process, ECHO iteratively re-generates the rationale of one demonstration with other demonstrations as in-context examples.
  • Figure 3: Performance for ECHO initialized by manual prompts and Zero-Shot-CoT generated prompts with 0, 1 and 3 iterations. We conclude the generated prompts perform better after the unification process, while the manual ones perform better when applied directly.
  • Figure 4: Performance of ECHO in different domains with iteration increases exponentially. We found that $T=4$ results in the best overall performance.
  • Figure 5: Performance of ECHO and Auto-CoT under 3 different settings of demonstrations: one from each cluster, randomly sampled and all from same cluster. ECHO with more iterations gain more performance from diverse demonstrations.