Table of Contents
Fetching ...

Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information

Lukas Struppek, Dominik Hintersdorf, Hannah Struppek, Daniel Neider, Kristian Kersting

TL;DR

The paper introduces Focused Chain-of-Thought (F-CoT), a training-free prompting strategy that splits information extraction from reasoning by providing a structured context (fixed XML-like blocks) before reasoning. This input-centric approach yields 2–3x reductions in generated tokens with comparable reasoning accuracy to standard zero-shot CoT on arithmetic problems, demonstrating that structured inputs can significantly improve inference efficiency. The authors validate F-CoT across multiple model sizes and datasets, explore pre-computed versus self-generated contexts, and show robustness to prompt and format variations while identifying limitations and future directions for integrating structure with broader prompting and multimodal settings. Overall, the work argues that input representation is a powerful, orthogonal lever for efficient, faithful LLM reasoning.

Abstract

Recent large language models achieve strong reasoning performance by generating detailed chain-of-thought traces, but this often leads to excessive token use and high inference latency. Existing efficiency approaches typically focus on model-centric interventions, such as reinforcement learning or supervised fine-tuning, to reduce verbosity. In contrast, we propose a training-free, input-centric approach. Inspired by cognitive psychology, we introduce Focused Chain-of-Thought (F-CoT), which separates information extraction from the reasoning process. F-CoT first organizes the essential information from a query into a concise, structured context and then guides the model to reason exclusively over this context. By preventing attention to irrelevant details, F-CoT naturally produces shorter reasoning paths. On arithmetic word problems, F-CoT reduces generated tokens by 2-3x while maintaining accuracy comparable to standard zero-shot CoT. These results highlight structured input as a simple yet effective lever for more efficient LLM reasoning.

Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information

TL;DR

The paper introduces Focused Chain-of-Thought (F-CoT), a training-free prompting strategy that splits information extraction from reasoning by providing a structured context (fixed XML-like blocks) before reasoning. This input-centric approach yields 2–3x reductions in generated tokens with comparable reasoning accuracy to standard zero-shot CoT on arithmetic problems, demonstrating that structured inputs can significantly improve inference efficiency. The authors validate F-CoT across multiple model sizes and datasets, explore pre-computed versus self-generated contexts, and show robustness to prompt and format variations while identifying limitations and future directions for integrating structure with broader prompting and multimodal settings. Overall, the work argues that input representation is a powerful, orthogonal lever for efficient, faithful LLM reasoning.

Abstract

Recent large language models achieve strong reasoning performance by generating detailed chain-of-thought traces, but this often leads to excessive token use and high inference latency. Existing efficiency approaches typically focus on model-centric interventions, such as reinforcement learning or supervised fine-tuning, to reduce verbosity. In contrast, we propose a training-free, input-centric approach. Inspired by cognitive psychology, we introduce Focused Chain-of-Thought (F-CoT), which separates information extraction from the reasoning process. F-CoT first organizes the essential information from a query into a concise, structured context and then guides the model to reason exclusively over this context. By preventing attention to irrelevant details, F-CoT naturally produces shorter reasoning paths. On arithmetic word problems, F-CoT reduces generated tokens by 2-3x while maintaining accuracy comparable to standard zero-shot CoT. These results highlight structured input as a simple yet effective lever for more efficient LLM reasoning.

Paper Structure

This paper contains 30 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Focused Chain-of-Thought reasoning. The model first extracts key information into an XML-like context block and then performs reasoning based on that block. The context can also be pre-defined by the user or generated automatically by a larger LLM. When queried using only the context, large reasoning models produce significantly shorter reasoning traces compared to standard natural-language inputs. In this particular example, Qwen3 14B produces 43% fewer tokens compared to standard CoT prompting. Shown prompts are abbreviated; see Appx. \ref{['appx:context_extraction']} and \ref{['appx:context_reasoning']} for full prompts.
  • Figure 2: Comparison of 0-CoT and our F-CoT using Qwen3 models of various sizes. For F-CoT, two settings are shown: context pre-computed by GPT-5 mini (solid bars) and generated by the model itself (hatched bars). F-CoT results are expressed relative to 0-CoT. While F-CoT matches 0-CoT performance in most cases, it generates substantially fewer tokens, thereby improving inference efficiency. Detailed numerical results are provided in Appendices \ref{['appx:precomputed_context_results']} and \ref{['appx:self_generated_context_results']}.
  • Figure 3: Analysis of reasoning traces during the chain-of-thought, where each sentence is classified as Extraction, Reasoning, or Filler. Blue bars indicate the average share of tokens per category, while green bars show the average number of sentences per category. Although the relative token distribution remains largely unchanged, the number of reasoning and filler sentences is substantially reduced when using our F-CoT compared to 0-CoT. Model: Qwen3-14B; Dataset: MATH-500.