CoT-ICL Lab: A Synthetic Framework for Studying Chain-of-Thought Learning from In-Context Demonstrations
Vignesh Kothapalli, Hamed Firooz, Maziar Sanjabi
TL;DR
CoT-ICL Lab presents a tokenized, synthetic framework to study chain-of-thought in-context learning by decoupling the causal structure ${\mathcal{G}}$ from the token-processing functions ${\mathcal{H}}$. It enables controlled experiments with DAG-based reasoning, multi-input ICL sequences, and varied token processors, trained on decoder-only transformers up to ${730\times 10^6}$ parameters. The main findings show that chain-of-thought prompts accelerate accuracy transitions across model sizes, with depth being crucial when in-context demonstrations are limited and more examples aiding shallower models; constraining the diversity of token processors can improve causal-structure learning, and embedding/alignment and attention analyses reveal how models infer the DAG. The framework also reveals connections to NLP, such as faster adaptation for pre-trained models and sparse attention patterns in reasoning, offering a versatile testbed for theoretical and empirical exploration of CoT-ICL in language tasks. While synthetic, the results highlight how controlled DAGs, processing functions, and vocabulary shape in-context learning dynamics and provide actionable guidance for future investigations into reasoning in large language models.
Abstract
We introduce CoT-ICL Lab, a framework and methodology to generate synthetic tokenized datasets and systematically study chain-of-thought (CoT) in-context learning (ICL) in language models. CoT-ICL Lab allows fine grained control over the complexity of in-context examples by decoupling (1) the causal structure involved in chain token generation from (2) the underlying token processing functions. We train decoder-only transformers (up to 700M parameters) on these datasets and show that CoT accelerates the accuracy transition to higher values across model sizes. In particular, we find that model depth is crucial for leveraging CoT with limited in-context examples, while more examples help shallow models match deeper model performance. Additionally, limiting the diversity of token processing functions throughout training improves causal structure learning via ICL. We also interpret these transitions by analyzing transformer embeddings and attention maps. Overall, CoT-ICL Lab serves as a simple yet powerful testbed for theoretical and empirical insights into ICL and CoT in language models.
