Table of Contents
Fetching ...

Vis-CoT: A Human-in-the-Loop Framework for Interactive Visualization and Intervention in LLM Chain-of-Thought Reasoning

Kaviraj Pather, Elena Hadjigeorgiou, Arben Krasniqi, Claire Schmit, Irina Rusu, Marc Pons, Kabir Khan

TL;DR

Vis-CoT introduces a human-in-the-loop framework that converts chain-of-thought reasoning into an interactive graph, enabling users to flag, prune, and graft reasoning steps to guide LLMs toward correct conclusions. The system builds a structured DAG from CoT traces, provides a two-pane visualization, and uses user interventions to generate context-aware prompts for continued reasoning. Across GSM8K, StrategyQA, and a custom planning task, Vis-CoT yields significant accuracy improvements and higher usability/trust scores compared with non-interactive baselines. The work demonstrates that targeted human oversight, when coupled with structured reasoning representations, can substantially improve reliability, interpretability, and collaboration in complex AI reasoning tasks.

Abstract

Large language models (LLMs) show strong reasoning via chain-of-thought (CoT) prompting, but the process is opaque, which makes verification, debugging, and control difficult in high-stakes settings. We present Vis-CoT, a human-in-the-loop framework that converts linear CoT text into an interactive reasoning graph. Users can visualize the logical flow, identify flawed steps, and intervene by pruning incorrect paths and grafting new, user-defined premises. This shifts interaction from passive observation to active collaboration, steering models toward more accurate and trustworthy conclusions. Across GSM8K and StrategyQA, Vis-CoT improves final-answer accuracy by up to 24 percentage points over non-interactive baselines. A user study also shows large gains in perceived usability and trust. Vis-CoT points to a practical path for more reliable, understandable, and collaborative reasoning by combining LLMs with targeted human oversight.

Vis-CoT: A Human-in-the-Loop Framework for Interactive Visualization and Intervention in LLM Chain-of-Thought Reasoning

TL;DR

Vis-CoT introduces a human-in-the-loop framework that converts chain-of-thought reasoning into an interactive graph, enabling users to flag, prune, and graft reasoning steps to guide LLMs toward correct conclusions. The system builds a structured DAG from CoT traces, provides a two-pane visualization, and uses user interventions to generate context-aware prompts for continued reasoning. Across GSM8K, StrategyQA, and a custom planning task, Vis-CoT yields significant accuracy improvements and higher usability/trust scores compared with non-interactive baselines. The work demonstrates that targeted human oversight, when coupled with structured reasoning representations, can substantially improve reliability, interpretability, and collaboration in complex AI reasoning tasks.

Abstract

Large language models (LLMs) show strong reasoning via chain-of-thought (CoT) prompting, but the process is opaque, which makes verification, debugging, and control difficult in high-stakes settings. We present Vis-CoT, a human-in-the-loop framework that converts linear CoT text into an interactive reasoning graph. Users can visualize the logical flow, identify flawed steps, and intervene by pruning incorrect paths and grafting new, user-defined premises. This shifts interaction from passive observation to active collaboration, steering models toward more accurate and trustworthy conclusions. Across GSM8K and StrategyQA, Vis-CoT improves final-answer accuracy by up to 24 percentage points over non-interactive baselines. A user study also shows large gains in perceived usability and trust. Vis-CoT points to a practical path for more reliable, understandable, and collaborative reasoning by combining LLMs with targeted human oversight.

Paper Structure

This paper contains 21 sections, 6 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overall architecture of the Vis-CoT framework.
  • Figure 2: Interactive visualization interface with a global reasoning graph and a node detail panel.
  • Figure 3: Three intervention operations: (a) Flagging, (b) Pruning, and (c) Grafting.
  • Figure 4: Constructing the feedback prompt $C_{\text{prompt}}$ from the validated path and the user-provided step.
  • Figure 5: Efficiency comparison before/after intervention (lower completion time / fewer interventions is better).
  • ...and 2 more figures