Learning Iterative Reasoning through Energy Diffusion

Yilun Du; Jiayuan Mao; Joshua B. Tenenbaum

Learning Iterative Reasoning through Energy Diffusion

Yilun Du, Jiayuan Mao, Joshua B. Tenenbaum

TL;DR

IRED addresses learning-to-reason by formulating reasoning as energy minimization over a learned function $E_\theta({\bm{x}}, {\bm{y}})$. It introduces an annealed sequence of energy landscapes $E^k_\theta$ with denoising and contrastive supervision, enabling stable training and progressive refinement during inference. By adaptively allocating optimization steps across landscapes, IRED generalizes to harder instances across continuous, discrete, and planning tasks. Empirically, it outperforms domain-specific and diffusion-based baselines, with notable gains on challenging problems such as hard Sudoku, large graphs, and ill-conditioned matrices.

Abstract

We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks by formulating reasoning and decision-making problems with energy-based optimization. IRED learns energy functions to represent the constraints between input conditions and desired outputs. After training, IRED adapts the number of optimization steps during inference based on problem difficulty, enabling it to solve problems outside its training distribution -- such as more complex Sudoku puzzles, matrix completion with large value magnitudes, and pathfinding in larger graphs. Key to our method's success is two novel techniques: learning a sequence of annealed energy landscapes for easier inference and a combination of score function and energy landscape supervision for faster and more stable training. Our experiments show that IRED outperforms existing methods in continuous-space reasoning, discrete-space reasoning, and planning tasks, particularly in more challenging scenarios. Code and visualizations at https://energy-based-model.github.io/ired/

Learning Iterative Reasoning through Energy Diffusion

TL;DR

IRED addresses learning-to-reason by formulating reasoning as energy minimization over a learned function

. It introduces an annealed sequence of energy landscapes

with denoising and contrastive supervision, enabling stable training and progressive refinement during inference. By adaptively allocating optimization steps across landscapes, IRED generalizes to harder instances across continuous, discrete, and planning tasks. Empirically, it outperforms domain-specific and diffusion-based baselines, with notable gains on challenging problems such as hard Sudoku, large graphs, and ill-conditioned matrices.

Abstract

Paper Structure (13 sections, 5 equations, 13 figures, 12 tables, 2 algorithms)

This paper contains 13 sections, 5 equations, 13 figures, 12 tables, 2 algorithms.

Introduction
Related Work
Learning Iterative Reasoning through Energy Optimization
Reasoning as Annealed Energy Minimization
Learning Sequence of Annealed Energy Landscapes
Combined Training and Inference Paradigms
Experiments
Continuous Algorithmic Reasoning
Discrete-Space Reasoning
Planning
Conclusion and Discussions
Experimental Details
Model Architectures

Figures (13)

Figure 1: Reasoning as Energy Diffusion -- IRED formulates reasoning problem with inputs ${\bm{x}}$ and output ${\bm{y}}$, as an energy minimization problem over a learned energy function. It can be trained stably for a wide variety of reasoning tasks and achieves strong generalization to harder problem instances, through adaptive computation in the optimization process.
Figure 2: IRED Learns a Sequence of Energy Landscapes. During inference time, we optimize for ${\bm{y}}^*$ that minimizes the energy function, and we gradually increase the complexity of the energy optimization problem. The energy functions are trained with a combination of score function supervision and energy landscape supervision.
Figure 3: Optimized Solutions Across Landscapes -- Error maps of intermediate optimized solutions. Optimized solutions at earlier landscapes are less accurate than later ones.
Figure 4: Energy Landscape -- Predicted energy values for ${\bm{y}}$ and the corresponding MSE distance of ${\bm{y}}$ from the problem solution across different landscapes on the matrix inverse task. The earlier energy landscapes are smoother than the later ones.
Figure 5: Optimized Boards Across Landscapes -- Plot of the minimal energy board across energy landscapes, given the same initial board. Later energy landscapes lead to more accurate boards. We highlight inconsistent entries in red.
...and 8 more figures

Learning Iterative Reasoning through Energy Diffusion

TL;DR

Abstract

Learning Iterative Reasoning through Energy Diffusion

Authors

TL;DR

Abstract

Table of Contents

Figures (13)