Table of Contents
Fetching ...

Therefore I am. I Think

Esakkivel Esakkiraja, Sai Rajeswar, Denis Akhiyarov, Rajagopal Venkatesaramani

Abstract

We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable, early-encoded decisions shape chain-of-thought in reasoning models. Specifically, we show that a simple linear probe successfully decodes tool-calling decisions from pre-generation activations with very high confidence, and in some cases, even before a single reasoning token is produced. Activation steering supports this causally: perturbing the decision direction leads to inflated deliberation, and flips behavior in many examples (between 7 - 79% depending on model and benchmark). We also show through behavioral analysis that, when steering changes the decision, the chain-of-thought process often rationalizes the flip rather than resisting it. Together, these results suggest that reasoning models can encode action choices before they begin to deliberate in text.

Therefore I am. I Think

Abstract

We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable, early-encoded decisions shape chain-of-thought in reasoning models. Specifically, we show that a simple linear probe successfully decodes tool-calling decisions from pre-generation activations with very high confidence, and in some cases, even before a single reasoning token is produced. Activation steering supports this causally: perturbing the decision direction leads to inflated deliberation, and flips behavior in many examples (between 7 - 79% depending on model and benchmark). We also show through behavioral analysis that, when steering changes the decision, the chain-of-thought process often rationalizes the flip rather than resisting it. Together, these results suggest that reasoning models can encode action choices before they begin to deliberate in text.

Paper Structure

This paper contains 30 sections, 6 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Overview of our methodology. Linear probes detect action decisions. We apply steering vectors, and measure quantitative as well as behavioral impact on CoT.
  • Figure 2: Decision predictability using probes at layer 20 for Qwen3-4B and GLM-Z1-9B. Both models exhibit a dip at around 5% of the reasoning trace.
  • Figure 3: Agreement ratio between decisions detected by probe at layer 20 for various stages and think_end tokens, and correctness, for Qwen3-4B and GLM-Z1-9B.
  • Figure 4: Example of injection steering (Qwen3-4B) that forces a tool call when the baseline response is to abstain.
  • Figure 5: Probe AUROC across sampled layers and generation positions on When2Call for the two main models, Qwen3-4B and GLM-Z1-9B. In both cases, the strongest probes appear in mid-to-late layers, with strong pre_gen predictability and a dip around 5% to 10% of the reasoning trace.
  • ...and 10 more figures