Why Can Large Language Models Generate Correct Chain-of-Thoughts?

Rasul Tutunov; Antoine Grosnit; Juliusz Ziomek; Jun Wang; Haitham Bou-Ammar

Why Can Large Language Models Generate Correct Chain-of-Thoughts?

Rasul Tutunov, Antoine Grosnit, Juliusz Ziomek, Jun Wang, Haitham Bou-Ammar

TL;DR

The paper develops a theoretical framework to explain why large language models can generate correct chain-of-thoughts via few-shot prompting. It introduces a two-level hierarchical latent language framework that models evolving contexts and intentions and proves a geometric convergence bound showing that the discrepancy between LLM-CoT likelihood and true-language CoT likelihood decays with the number of CoT exemplars. The key result formalizes how CoT exemplars help the model infer the underlying reasoning context, with the rate of decay governed by language ambiguity and, in extensions, context priors. The work provides principled guidance for designing CoT prompts and sheds light on the conditions under which CoT prompting yields reliable step-by-step reasoning, while outlining concrete future directions for empirical validation and broader prompting strategies.

Abstract

This paper delves into the capabilities of large language models (LLMs), specifically focusing on advancing the theoretical comprehension of chain-of-thought prompting. We investigate how LLMs can be effectively induced to generate a coherent chain of thoughts. To achieve this, we introduce a two-level hierarchical graphical model tailored for natural language generation. Within this framework, we establish a compelling geometrical convergence rate that gauges the likelihood of an LLM-generated chain of thoughts compared to those originating from the true language. Our findings provide a theoretical justification for the ability of LLMs to produce the correct sequence of thoughts (potentially) explaining performance gains in tasks demanding reasoning skills.

Why Can Large Language Models Generate Correct Chain-of-Thoughts?

TL;DR

Abstract

Paper Structure (22 sections, 2 theorems, 29 equations, 1 figure, 1 table)

This paper contains 22 sections, 2 theorems, 29 equations, 1 figure, 1 table.

Introduction
Theoretical Attempts of In-Context Learning:
Chain-of-Thoughts Prompting:
Contributions of This Study:
Chain-of-Thoughts Formulation
Chains of Thoughts in Natural Language.
An example of CoT:
An example of CoT:
Probabilistic Graphical Model:
Chain of Thoughts in LLMs
LLMs as marginal approximators
Inferring context from CoT prompting
Natural Language Ambiguity:
LLMs can Produce Correct CoTs
LLMs can Produce Correct CoTs
...and 7 more sections

Key Result

Theorem 3.2

Consider a collection of $N$ varying length chain-of-thought examples $\boldsymbol{Z}_k = (\boldsymbol{z}_{k,r})_{0 \le r\le m_k}$ generated from $(\boldsymbol{\theta}^*_{{k,r}})_{0\le r\le m_k}$ with a context $\boldsymbol{c}^* \sim q(\boldsymbol{c})$ that satisfies Assumption assumption_uniform_co with $\eta = 2\left( \epsilon(\boldsymbol{x}_0)/1-\epsilon(\boldsymbol{x}_0)\right)$ depending on t

Figures (1)

Figure 1: Probabilistic graphical model of natural language text generation that is compatible with the generation of chains of thoughts. $\boldsymbol{c}$ is a context, $(\boldsymbol{\theta}_i)_{0\leq i \leq M}$ is a sequence of intentions, and $(\boldsymbol{x}_i)_{0\leq i \leq M}$ is the sequence of messages corresponding to the formulated thoughts. The generation ends when the stop token is output $x_M= "\langle \text{END} \rangle$".

Theorems & Definitions (5)

Theorem 3.2
proof
Lemma 4.3
proof
proof

Why Can Large Language Models Generate Correct Chain-of-Thoughts?

TL;DR

Abstract

Why Can Large Language Models Generate Correct Chain-of-Thoughts?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (5)