Table of Contents
Fetching ...

Reasoning as Energy Minimization over Structured Latent Trajectories

David K. Johansson

Abstract

Single-shot neural decoders commit to answers without iterative refinement, while chain-of-thought methods introduce discrete intermediate steps but lack a scalar measure of reasoning progress. We propose Energy-Based Reasoning via Structured Latent Planning (EBRM), which models reasoning as gradient-based optimization of a multi-step latent trajectory $z_{1:T}$ under a learned energy function $E(h_x, z)$. The energy decomposes into per-step compatibility, transition consistency, and trajectory smoothness terms. Training combines supervised encoder-decoder learning with contrastive energy shaping using hard negatives, while inference performs gradient descent or Langevin dynamics over $z$ and decodes from $z_T$. We identify a critical failure mode: on CNF logic satisfaction, latent planning reduces accuracy from $\approx 95\%$ to $\approx 56\%$. This degradation arises from a distribution mismatch, where the decoder is trained on encoder outputs $h_x$ but evaluated on planner outputs $z_T$ that drift into unseen latent regions. We analyze this behavior through per-step decoding, latent drift tracking, and gradient decomposition. To address it, we propose dual-path decoder training and latent anchoring. We further introduce a six-part ablation protocol covering component contributions, trajectory length, planner dynamics, initialization, decoder training distribution, and anchor weight. Experiments on three synthetic tasks show that energy decreases monotonically and induces structured latent trajectories on graph and logic tasks, while remaining flat on arithmetic ($r = 0.073$), indicating a negative result. Code is available at https://github.com/dkjo8/ebr-via-structured-latent-planning.

Reasoning as Energy Minimization over Structured Latent Trajectories

Abstract

Single-shot neural decoders commit to answers without iterative refinement, while chain-of-thought methods introduce discrete intermediate steps but lack a scalar measure of reasoning progress. We propose Energy-Based Reasoning via Structured Latent Planning (EBRM), which models reasoning as gradient-based optimization of a multi-step latent trajectory under a learned energy function . The energy decomposes into per-step compatibility, transition consistency, and trajectory smoothness terms. Training combines supervised encoder-decoder learning with contrastive energy shaping using hard negatives, while inference performs gradient descent or Langevin dynamics over and decodes from . We identify a critical failure mode: on CNF logic satisfaction, latent planning reduces accuracy from to . This degradation arises from a distribution mismatch, where the decoder is trained on encoder outputs but evaluated on planner outputs that drift into unseen latent regions. We analyze this behavior through per-step decoding, latent drift tracking, and gradient decomposition. To address it, we propose dual-path decoder training and latent anchoring. We further introduce a six-part ablation protocol covering component contributions, trajectory length, planner dynamics, initialization, decoder training distribution, and anchor weight. Experiments on three synthetic tasks show that energy decreases monotonically and induces structured latent trajectories on graph and logic tasks, while remaining flat on arithmetic (), indicating a negative result. Code is available at https://github.com/dkjo8/ebr-via-structured-latent-planning.

Paper Structure

This paper contains 11 sections, 10 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: EBRM overview. Encode problem $x$ to context $h_x$; minimize $E(h_x,z)$ over latent trajectory $z_{1:T}$ via gradient descent or Langevin dynamics; decode $z_T$ to answer $\hat{y}$.
  • Figure 2: Direct vs planner endpoint performance across tasks. Planning degrades logic accuracy from ${\approx}95\%$ to ${\approx}56\%$, motivating the failure analysis in Section \ref{['sec:failure']}.
  • Figure 3: Energy during latent planning. Left: Graph --- energy decreases consistently. Center: Logic --- monotonic descent across formulas. Right: Arithmetic --- energy is flat, indicating limited optimization progress.
  • Figure 4: Latent trajectories in PCA space. Left: Graph --- trajectories diverge from a shared start to instance-specific endpoints. Right: Logic --- trajectories from diverse starts converge to a shared terminal cluster.
  • Figure 5: Energy landscapes around $z_T$. Left: Graph --- smooth directional gradients. Center: Logic --- structured surface with clear low-energy basin. Right: Arithmetic --- nearly flat surface with negligible gradient signal.
  • ...and 3 more figures