AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations

Zhicheng Yang; Yinya Huang; Jing Xiong; Liang Feng; Xiaodan Liang; Yiwei Wang; Jing Tang

AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations

Zhicheng Yang, Yinya Huang, Jing Xiong, Liang Feng, Xiaodan Liang, Yiwei Wang, Jing Tang

TL;DR

AlignedCoT addresses prompt sensitivity in LLM chain-of-thought prompting by eliciting the model’s native reasoning style through probing, refinement, and formatting, achieving high-quality zero-shot CoTs without handcrafted demonstrations. The method yields consistent reasoning improvements across multiple benchmarks and models, enhances detection of logical pitfalls, and remains compatible with retrieval-augmented generation and smaller LMs. Empirical results show notable gains over baselines like standard CoT, Auto-CoT, and Complex CoT, and GSM8K-Align enables improved RAG performance. The work suggests that prompting LLMs in their native linguistic style can unlock embedded knowledge more effectively, reducing the need for manual exemplars and enabling broader applicability.

Abstract

Large Language Models prompting, such as using in-context demonstrations, is a mainstream technique for invoking LLMs to perform high-performance and solid complex reasoning (e.g., mathematical reasoning, commonsense reasoning), and has the potential for further human-machine collaborative scientific findings. However, current LLMs are delicate and elusive in prompt words and styles. And there is an unseen gap between LLM understanding and human-written prompts. This paper introduces Alignedcot, an LLM-acquainted prompting technique that includes proficient ``native-speaking'' in in-context learning for the LLMs. Specifically, it achieves consistent and correct step-wise prompts in zero-shot scenarios by progressively probing, refining, and formatting the LLM chain of thoughts so that free from handcrafted few-shot demonstrations while maintaining the prompt quality. We conduct experiments on mathematical reasoning and commonsense reasoning. We find that LLMs with Alignedcot perform significantly superior to them with human-crafted demonstrations. We further apply Alignedcot for rewriting the GSM8K training set, resulting in a GSM8K-Align dataset. We observe its benefits for retrieval augmented generation. The code and data can be found at https://github.com/yangzhch6/AlignedCoT.

AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations

TL;DR

Abstract

Paper Structure (31 sections, 9 figures, 10 tables)

This paper contains 31 sections, 9 figures, 10 tables.

Introduction
Related Work
AlignedCoT Prompting
Probing Native-Style of LLM
Refining CoTs
Unifying the Format of CoTs
Experiments
Experimental Setup
Main Results
AlignedCoT for Logical Pitfalls
Ablation Study
AlignedCoT with Smaller LMs
AlignedCoT for RAG
Case Study
Conclusion
...and 16 more sections

Figures (9)

Figure 1: A human/machine (A) tends to accept words in her own style (A's own text style) rather than other people’s (B's text style). In this work, we investigate efficient CoT demonstrations by resorting to LLM-learned text habits (an LLM-style).
Figure 2: A. Existing few-shot demonstrations are conventionally dataset samples or human crafts ("Manual-Style"). As a result, an LLM tends to copy the "Manual-Style" format mechanically. B. The proposed AlignedCoT prompt has zero-shot CoTs with correct and in LLM-acquainted format ("Native-Style’’). The AlignedCoT is obtained via three steps: (1) Probing LLM's native style in zero-shot scenarios; (2) Refining the generated CoT to correct errors in the first step; (3) Formatting the generated CoTs in the first two steps.
Figure 3: The illustration of our refining process. The modifications in red are annotated manually. We modify the first error each time and then query the LLM to complete the text behind the last modified error.
Figure 4: In the case of sampling diverse reasoning paths on GSM8K, our AlignedCoT also outperforms Complex CoT.
Figure 5: Two cases of logical error detection. The text in brown is GPT-4's reasoning process for discovering logical incorrectness.
...and 4 more figures

AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations

TL;DR

Abstract

AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations

Authors

TL;DR

Abstract

Table of Contents

Figures (9)