Table of Contents
Fetching ...

Think Consistently, Reason Efficiently: Energy-Based Calibration for Implicit Chain-of-Thought

Zhikang Chen, Sen Cui, Deheng Ye, Yu Zhang, Yatao Bian, Tingting Zhu

TL;DR

This work tackles the inconsistency and inefficiency of implicit chain-of-thought by introducing EBM-CoT, which uses an energy-based model to calibrate latent thought embeddings during reasoning. By performing Langevin-based refinement of latent states and jointly training an energy function with the language model objective, the approach achieves higher reasoning consistency and competitive accuracy without updating the base models. Empirical results across mathematical, commonsense, and symbolic tasks show strong single-chain performance close to multi-chain ensembles, with substantial gains in stability and efficiency. The method offers a principled bridge between implicit latent reasoning and explicit generation, enabling more reliable multi-step reasoning in large language models.

Abstract

Large Language Models (LLMs) have demonstrated strong reasoning capabilities through \emph{Chain-of-Thought} (CoT) prompting, which enables step-by-step intermediate reasoning. However, explicit CoT methods rely on discrete token-level reasoning processes that are prone to error propagation and limited by vocabulary expressiveness, often resulting in rigid and inconsistent reasoning trajectories. Recent research has explored implicit or continuous reasoning in latent spaces, allowing models to perform internal reasoning before generating explicit output. Although such approaches alleviate some limitations of discrete CoT, they generally lack explicit mechanisms to enforce consistency among reasoning steps, leading to divergent reasoning paths and unstable outcomes. To address this issue, we propose EBM-CoT, an Energy-Based Chain-of-Thought Calibration framework that refines latent thought representations through an energy-based model (EBM). Our method dynamically adjusts latent reasoning trajectories toward lower-energy, high-consistency regions in the embedding space, improving both reasoning accuracy and consistency without modifying the base language model. Extensive experiments across mathematical, commonsense, and symbolic reasoning benchmarks demonstrate that the proposed framework significantly enhances the consistency and efficiency of multi-step reasoning in LLMs.

Think Consistently, Reason Efficiently: Energy-Based Calibration for Implicit Chain-of-Thought

TL;DR

This work tackles the inconsistency and inefficiency of implicit chain-of-thought by introducing EBM-CoT, which uses an energy-based model to calibrate latent thought embeddings during reasoning. By performing Langevin-based refinement of latent states and jointly training an energy function with the language model objective, the approach achieves higher reasoning consistency and competitive accuracy without updating the base models. Empirical results across mathematical, commonsense, and symbolic tasks show strong single-chain performance close to multi-chain ensembles, with substantial gains in stability and efficiency. The method offers a principled bridge between implicit latent reasoning and explicit generation, enabling more reliable multi-step reasoning in large language models.

Abstract

Large Language Models (LLMs) have demonstrated strong reasoning capabilities through \emph{Chain-of-Thought} (CoT) prompting, which enables step-by-step intermediate reasoning. However, explicit CoT methods rely on discrete token-level reasoning processes that are prone to error propagation and limited by vocabulary expressiveness, often resulting in rigid and inconsistent reasoning trajectories. Recent research has explored implicit or continuous reasoning in latent spaces, allowing models to perform internal reasoning before generating explicit output. Although such approaches alleviate some limitations of discrete CoT, they generally lack explicit mechanisms to enforce consistency among reasoning steps, leading to divergent reasoning paths and unstable outcomes. To address this issue, we propose EBM-CoT, an Energy-Based Chain-of-Thought Calibration framework that refines latent thought representations through an energy-based model (EBM). Our method dynamically adjusts latent reasoning trajectories toward lower-energy, high-consistency regions in the embedding space, improving both reasoning accuracy and consistency without modifying the base language model. Extensive experiments across mathematical, commonsense, and symbolic reasoning benchmarks demonstrate that the proposed framework significantly enhances the consistency and efficiency of multi-step reasoning in LLMs.

Paper Structure

This paper contains 44 sections, 40 equations, 4 figures, 3 tables, 2 algorithms.

Figures (4)

  • Figure 1: Comparison of reasoning paradigms and the proposed energy-based calibration framework. Left Figure: Different forms of reasoning in language models, including No CoT, Explicit CoT, and Implicit CoT. Middle Figure: Our method integrates an Energy-Based Model (EBM) to calibrate the latent thought tokens generated by the assistant model, producing refined soft thoughts with improved coherence and consistency. Right Figure: Experimental results on GSM8K show that our approach, implemented on top of LLaMA-3.1-8B-Instruct, achieves superior accuracy and substantially higher consistency compared to previous CoT variants, effectively matching the performance of multi-CoT reasoning with a single CoT using calibrated thought tokens.
  • Figure 2: Overall architecture of the proposed EBM-CoT framework. Given an assistant instruction and a question, along with special tokens, the assistant model first encodes the input and generates a sequence of latent thought tokens that represent intermediate reasoning steps. These latent thoughts are then projected through a learnable projection module into the embedding space compatible with the base model. An Energy-Based Model (EBM) further refines the projected (Pre-EBM) latent thought tokens via Langevin calibration, assigning lower energy to coherent reasoning states. Finally, the calibrated (Post-EBM) latent thought tokens, together with the base instruction and question, are fed into the frozen Base model to generate explicit reasoning steps and produce the final answer. This pipeline enables consistent and efficient reasoning from question to answer while preserving the parameters of both the base and assistant models.
  • Figure 3: Left Figure: Consistency rate comparison on GSM8K, ASDiv-Aug, and AQuA using Qwen3-8B as the base model and Qwen3-0.6B as the assistant model. Right Figure: Ablation over assistant model sizes on GSM8K using Qwen2.5-7B-Instruct as the base model.
  • Figure 4: Left Figure: Results on ASDiv-Aug using Qwen2.5-7B-Instruct as the base model and Qwen2.5-1.5B-Instruct as the assistant model, with varying numbers of latent thought tokens. Middle Figure: Results on GSM8K using Qwen3-8B as the base model and Qwen3-0.6B as the assistant model, where $\alpha$ controls the relative strength of the energy-based regularization term $\mathcal{L}_{\mathrm{EBM}}$. Right Figure: Comparison across different numbers of reasoning chains $N$ (pass@$N$) using Qwen3-8B as the base model. While larger $N$ slightly improves performance due to answer aggregation under the self-consistency setting, our model already achieves strong single-chain accuracy ($N=1$), highlighting that energy-based calibration substantially enhances reasoning consistency without relying on multi-chain sampling.