Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning

Santosh Kumar Radha; Yasamin Nouri Jelyani; Ara Ghukasyan; Oktay Goktas

Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning

Santosh Kumar Radha, Yasamin Nouri Jelyani, Ara Ghukasyan, Oktay Goktas

TL;DR

The paper addresses the limitations of static reasoning prompts in large language models by introducing Iteration of Thought (IoT), an autonomous, adaptive prompting framework. IoT pairs an Inner Dialogue Agent (IDA) with an LLM Agent (LLMA) in a closed loop, generating context-aware prompts $p_i = C(q, r_{i-1})$ and producing refined responses $r_i = L(q, p_i, K)$, with two variants: AIoT (dynamic termination) and GIoT (fixed iterations). Across GPQA, Game of 24, Mini Crosswords, and HotpotQA-Hard, IoT demonstrates improved accuracy and robustness compared with Chain-of-Thought (CoT) and Tree-of-Thought (ToT) baselines, often achieving higher F1 and EM scores and lower variance. The work highlights IoT’s potential for autonomous, self-guided reasoning and discusses strengths (transparency, modularity) and weaknesses (risk of premature termination or hallucination) with directions for future expansion, including larger IDA knowledge bases and tool-assisted validation. Overall, IoT represents a scalable approach to dynamic reasoning in LLMs with practical implications for autonomous AI systems and reduced human intervention.

Abstract

Iterative human engagement is a common and effective means of leveraging the advanced language processing power of large language models (LLMs). Using well-structured prompts in a conversational manner, human users can effectively influence an LLM to develop more thoughtful and accurate responses. Motivated by this insight, we propose the Iteration of Thought (IoT) framework for enhancing LLM responses by generating "thought"-provoking prompts vis a vis an input query and the current iteration of an LLM's response. Unlike static or semi-static approaches, e.g. Chain of Thought (CoT) or Tree of Thoughts (ToT), IoT adapts its reasoning path dynamically, based on evolving context, and without generating alternate explorative thoughts which are ultimately discarded. The three components of the IoT framework are (1) an Inner Dialogue Agent (IDA) responsible for generating instructive, context-specific prompts; (2) an LLM Agent (LLMA) that processes these prompts to refine its responses; and (3) an iterative prompting loop that implements a conversation between the former two components. We introduce two variants of our framework: Autonomous Iteration of Thought (AIoT), where an LLM decides when to stop iterating, and Guided Iteration of Thought (GIoT), which always forces a fixed number iterations. We investigate the performance of IoT across various datasets, spanning complex reasoning tasks from the GPQA dataset, explorative problem-solving in Game of 24, puzzle solving in Mini Crosswords, and multi-hop question answering from the HotpotQA dataset. Our results show that IoT represents a viable paradigm for autonomous response refinement in LLMs, showcasing significant improvements over CoT and thereby enabling more adaptive and efficient reasoning systems that minimize human intervention.

Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning

TL;DR

and producing refined responses

, with two variants: AIoT (dynamic termination) and GIoT (fixed iterations). Across GPQA, Game of 24, Mini Crosswords, and HotpotQA-Hard, IoT demonstrates improved accuracy and robustness compared with Chain-of-Thought (CoT) and Tree-of-Thought (ToT) baselines, often achieving higher F1 and EM scores and lower variance. The work highlights IoT’s potential for autonomous, self-guided reasoning and discusses strengths (transparency, modularity) and weaknesses (risk of premature termination or hallucination) with directions for future expansion, including larger IDA knowledge bases and tool-assisted validation. Overall, IoT represents a scalable approach to dynamic reasoning in LLMs with practical implications for autonomous AI systems and reduced human intervention.

Abstract

Paper Structure (14 sections, 7 equations, 6 figures, 2 tables, 2 algorithms)

This paper contains 14 sections, 7 equations, 6 figures, 2 tables, 2 algorithms.

Introduction
Iteration of thought (IoT)
Framework and implementation
Autonomous iteration of thought (AIoT)
Guided iteration of thought (GIoT)
Results
Assessing IoT on the GPQA questionnaire
Assessing IoT on explorative problem-solving tasks
Assessing IoT on multi-context reasoning and retrieval tasks
Strengths and weaknesses of IoT
Conclusion and future work
Appendix
Examples
Example of AIoT

Figures (6)

Figure 1: Illustration of different prompting strategies for enhancing LLM reasoning capabilities. The Input-Output (IO) method uses a direct input-output approach with no intermediate reasoning. Chain-of-Thought (CoT) wei2022chain prompts introduce a single, linear reasoning path, while Tree-of-Thought (ToT) yao2024tree methods expand this by exploring multiple reasoning paths in parallel. The proposed Iteration-of-Thought (IoT) (This work) framework introduces an Inner Dialogue Agent (IDA) to dynamically adjust reasoning paths, enabling adaptive cross-path exploration to enhance response accuracy.
Figure 1: Comparison of accuracies (and relative improvements) for different methods on GPQA Diamond Dataset.
Figure 2: Schematic example of processing a sample query with the IoT framework. A simplistic question is asked for illustrative purposes. The guided IoT variant (GIoT) is utilized here, with the number of iterations set to 2. Each grey boxe contains an individual iteration of IoT, with the IDA shown in yellow and the LLMA in green.
Figure 3: Comparison of GPQA evaluation accuracies for different methods.
Figure 4: Performance comparison across different methods (GIoT, AIoT, CoT, IO) on Mini Crossword: Letters, Mini Crossword: Words, and Game of 24 tasks. Box plots represent the distribution of mean accuracy percentages across different trials.
...and 1 more figures

Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning

TL;DR

Abstract

Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)