Table of Contents
Fetching ...

Fuel Gauge: Estimating Chain-of-Thought Length Ahead of Time in Large Multimodal Models

Yuedong Yang, Xiwen Wei, Mustafa Munir, Radu Marculescu

TL;DR

The Fuel Gauge is proposed, the first method which extracts this hidden signal and predicts CoT length ahead of time and demonstrates the utility on the Fuel Gauge on two downstream tasks: predictive KV cache allocation, which addresses memory fragmentation in LMM serving systems, and CoT length modulation, which mitigates under-thinking and over-thinking.

Abstract

Reasoning Large Multi-modality Models (LMMs) have become the de facto choice for many applications. However, these models rely on a Chain-of-Thought (CoT) process that is lengthy and unpredictable at runtime, often resulting in inefficient use of computational resources (due to memory fragmentation) and sub-optimal accuracy (due to under- and over-thinking). We observe empirically that the CoT process follows a very simple form, whose behavior is independent of the specific generated samples. This suggests that the CoT length can be estimated ahead of time based on a hidden parameter representing the amount of "fuel" available to support the reasoning process. Based on this insight, we propose Fuel Gauge, the first method which extracts this hidden signal and predicts CoT length ahead of time. We demonstrate the utility on the Fuel Gauge on two downstream tasks: predictive KV cache allocation, which addresses memory fragmentation in LMM serving systems, and CoT length modulation, which mitigates under-thinking and over-thinking. Extensive experiments on LMMs across text-only, image-text, and video-text question answering benchmarks demonstrate the effectiveness, generalizability, and practical value of our Fuel Gauge. For example, on the GPQA-Diamond benchmark, our Fuel Gauge achieves less than half the CoT length prediction error compared to the baseline; this translates into a 13.37x reduction in the memory allocation frequency.

Fuel Gauge: Estimating Chain-of-Thought Length Ahead of Time in Large Multimodal Models

TL;DR

The Fuel Gauge is proposed, the first method which extracts this hidden signal and predicts CoT length ahead of time and demonstrates the utility on the Fuel Gauge on two downstream tasks: predictive KV cache allocation, which addresses memory fragmentation in LMM serving systems, and CoT length modulation, which mitigates under-thinking and over-thinking.

Abstract

Reasoning Large Multi-modality Models (LMMs) have become the de facto choice for many applications. However, these models rely on a Chain-of-Thought (CoT) process that is lengthy and unpredictable at runtime, often resulting in inefficient use of computational resources (due to memory fragmentation) and sub-optimal accuracy (due to under- and over-thinking). We observe empirically that the CoT process follows a very simple form, whose behavior is independent of the specific generated samples. This suggests that the CoT length can be estimated ahead of time based on a hidden parameter representing the amount of "fuel" available to support the reasoning process. Based on this insight, we propose Fuel Gauge, the first method which extracts this hidden signal and predicts CoT length ahead of time. We demonstrate the utility on the Fuel Gauge on two downstream tasks: predictive KV cache allocation, which addresses memory fragmentation in LMM serving systems, and CoT length modulation, which mitigates under-thinking and over-thinking. Extensive experiments on LMMs across text-only, image-text, and video-text question answering benchmarks demonstrate the effectiveness, generalizability, and practical value of our Fuel Gauge. For example, on the GPQA-Diamond benchmark, our Fuel Gauge achieves less than half the CoT length prediction error compared to the baseline; this translates into a 13.37x reduction in the memory allocation frequency.
Paper Structure (29 sections, 7 equations, 14 figures, 7 tables, 1 algorithm)

This paper contains 29 sections, 7 equations, 14 figures, 7 tables, 1 algorithm.

Figures (14)

  • Figure 1: Example of the output of reasoning LMM, which consists of a long CoT section wrapped with special symbols "< think>" and "</think>", and a short Conclusion section.
  • Figure 2: Correlation between Chain-of-Thoughts (CoT) and LMM accuracy collected from Qwen3 yang2025qwen3, Qwen3VL qwen3vl, Intern-S1 bai2025intern and GLM zeng2025glm across multiple text-only, image-text and video-text benchmarks. Using accuracy as a proxy for task difficulty, we observe a clear negative correlation between CoT length and task difficulty. This trend motivates our hypothesis that CoT length is predictable based solely on the question itself.
  • Figure 3: CoT length estimation using the Fuel Gauge. Numbers in the figure are randomly chosen for illustration purpose. See Section \ref{['sec:fuel_gauge_impl']} for implementation details. In Stage 1, the hidden signal $S_i$ is extracted using $f_{\text{sig}}$ and the corresponding fuel level is estimated with $f_{\text{fuel}}$. In Stage 2, a linear model is fitted to all predicted fuel-level points, and the zero-crossing point of this line is taken as the final CoT length prediction. Based on this CoT length estimation, we further develop two novel downstream applications (see Section \ref{['sec:applications']}).
  • Figure 4: CoT length prediction result with Qwen3-8B model on GPQA-Diamond benchmark. The prediction of our Fuel Gauge evolves as CoT progresses while the baseline fails.
  • Figure 5: CoT length and LMM accuracy for different CoT modulation factors $\eta$. Results are obtained with Qwen3-4B model on GPQA-Diamond benchmark. Orange star denotes the baseline case where no CoT modulation is applied. Figure (a) shows that Fuel Gauge controls the CoT length linearly. Then Figure (b) shows that the change in CoT length linearly translates to a change in accuracy. Finally Figure (c) shows that based on the linearity in figures (a) and (b), we can achieve our target and control the accuracy linearly with $\eta$.
  • ...and 9 more figures