Table of Contents
Fetching ...

ThinkPatterns-21k: A Systematic Study on the Impact of Thinking Patterns in LLMs

Pengcheng Wen, Jiaming Ji, Chi-Min Chan, Juntao Dai, Donghai Hong, Yaodong Yang, Sirui Han, Yike Guo

TL;DR

This work investigates how internal thinking patterns influence LLM reasoning across model sizes by introducing ThinkPatterns-21k, a dataset of 21k instruction-response pairs augmented with five thinking patterns (monologue, decomposition, self-ask, self-debate, self-critic). It systematically evaluates these patterns on 3B–32B models using AlpacaEval2 and Arena-Hard benchmarks, revealing that smaller models benefit from structured thinking while larger models perform best with unstructured monologue; self-critic shows particularly robust stability across sizes. The authors provide extensive dataset and training artifacts to support reproducibility and future research into reasoning strategies. The study advances understanding of size-pattern interactions and offers guidance for designing reasoning ecosystems in scalable LLMs.

Abstract

Large language models (LLMs) have demonstrated enhanced performance through the \textit{Thinking then Responding} paradigm, where models generate internal thoughts before final responses (aka, System 2 thinking). However, existing research lacks a systematic understanding of the mechanisms underlying how thinking patterns affect performance across model sizes. In this work, we conduct a comprehensive analysis of the impact of various thinking types on model performance and introduce ThinkPatterns-21k, a curated dataset comprising 21k instruction-response pairs (QA) collected from existing instruction-following datasets with five thinking types. For each pair, we augment it with five distinct internal thinking patterns: one unstructured thinking (monologue) and four structured variants (decomposition, self-ask, self-debate and self-critic), while maintaining the same instruction and response. Through extensive evaluation across different model sizes (3B-32B parameters), we have two key findings: (1) smaller models (<30B parameters) can benefit from most of structured thinking patterns, while larger models (32B) with structured thinking like decomposition would degrade performance and (2) unstructured monologue demonstrates broad effectiveness across different model sizes. Finally, we released all of our datasets, checkpoints, training logs of diverse thinking patterns to reproducibility, aiming to facilitate further research in this direction.

ThinkPatterns-21k: A Systematic Study on the Impact of Thinking Patterns in LLMs

TL;DR

This work investigates how internal thinking patterns influence LLM reasoning across model sizes by introducing ThinkPatterns-21k, a dataset of 21k instruction-response pairs augmented with five thinking patterns (monologue, decomposition, self-ask, self-debate, self-critic). It systematically evaluates these patterns on 3B–32B models using AlpacaEval2 and Arena-Hard benchmarks, revealing that smaller models benefit from structured thinking while larger models perform best with unstructured monologue; self-critic shows particularly robust stability across sizes. The authors provide extensive dataset and training artifacts to support reproducibility and future research into reasoning strategies. The study advances understanding of size-pattern interactions and offers guidance for designing reasoning ecosystems in scalable LLMs.

Abstract

Large language models (LLMs) have demonstrated enhanced performance through the \textit{Thinking then Responding} paradigm, where models generate internal thoughts before final responses (aka, System 2 thinking). However, existing research lacks a systematic understanding of the mechanisms underlying how thinking patterns affect performance across model sizes. In this work, we conduct a comprehensive analysis of the impact of various thinking types on model performance and introduce ThinkPatterns-21k, a curated dataset comprising 21k instruction-response pairs (QA) collected from existing instruction-following datasets with five thinking types. For each pair, we augment it with five distinct internal thinking patterns: one unstructured thinking (monologue) and four structured variants (decomposition, self-ask, self-debate and self-critic), while maintaining the same instruction and response. Through extensive evaluation across different model sizes (3B-32B parameters), we have two key findings: (1) smaller models (<30B parameters) can benefit from most of structured thinking patterns, while larger models (32B) with structured thinking like decomposition would degrade performance and (2) unstructured monologue demonstrates broad effectiveness across different model sizes. Finally, we released all of our datasets, checkpoints, training logs of diverse thinking patterns to reproducibility, aiming to facilitate further research in this direction.

Paper Structure

This paper contains 26 sections, 2 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Overview of our dataset construction pipeline. The process begins with gathering instruction-response pairs. These are subsequently processed by a large language model with specific prompts to produce five internal thinking patterns: monologue , decomposition , self-ask , self-debate , and self-critic , which are then merged to create our final dataset.
  • Figure 2: Comparison of Different Internal Thinking Patterns. This figure illustrates the contrast between vanilla instruction-response paradigm and our proposed five internal thought types: (a) vanilla instruction-response (b) Unstructured Monologue, mimicking natural human internal monologue (c) Decomposition Thought, which systematically breaks down complex tasks into manageable sub-problems (d) Self-Ask Thought, implementing Socratic questioning for deeper exploration (e) Self-Debate Thought, facilitating internal debate dialogue to reach optimal solutions (f) Self-Critic Thought, incorporating self-evaluation and refinement mechanisms. Each thought pattern demonstrates a unique reasoning pathway and problem-solving strategy.
  • Figure 3: Example of Unstructured Monologue .
  • Figure 4: Example of Decomposition Thought.
  • Figure 5: Example of Self-Ask Thought.
  • ...and 3 more figures