Why Can Large Language Models Generate Correct Chain-of-Thoughts?
Rasul Tutunov, Antoine Grosnit, Juliusz Ziomek, Jun Wang, Haitham Bou-Ammar
TL;DR
The paper develops a theoretical framework to explain why large language models can generate correct chain-of-thoughts via few-shot prompting. It introduces a two-level hierarchical latent language framework that models evolving contexts and intentions and proves a geometric convergence bound showing that the discrepancy between LLM-CoT likelihood and true-language CoT likelihood decays with the number of CoT exemplars. The key result formalizes how CoT exemplars help the model infer the underlying reasoning context, with the rate of decay governed by language ambiguity and, in extensions, context priors. The work provides principled guidance for designing CoT prompts and sheds light on the conditions under which CoT prompting yields reliable step-by-step reasoning, while outlining concrete future directions for empirical validation and broader prompting strategies.
Abstract
This paper delves into the capabilities of large language models (LLMs), specifically focusing on advancing the theoretical comprehension of chain-of-thought prompting. We investigate how LLMs can be effectively induced to generate a coherent chain of thoughts. To achieve this, we introduce a two-level hierarchical graphical model tailored for natural language generation. Within this framework, we establish a compelling geometrical convergence rate that gauges the likelihood of an LLM-generated chain of thoughts compared to those originating from the true language. Our findings provide a theoretical justification for the ability of LLMs to produce the correct sequence of thoughts (potentially) explaining performance gains in tasks demanding reasoning skills.
