Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

Cheng Lu; Huayu Chen; Jianfei Chen; Hang Su; Chongxuan Li; Jun Zhu

Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongxuan Li, Jun Zhu

TL;DR

This work addresses exact energy-guided diffusion sampling for unnormalized energy functions by deriving the exact intermediate energy $\mathcal{E}_t(\boldsymbol{x}_t)$ and its gradient, then training a neural proxy via Contrastive Energy Prediction (CEP) to estimate $\nabla_{\boldsymbol{x}_t}\mathcal{E}_t$. CEP guarantees convergence to the desired distribution $p(\boldsymbol{x}) \propto q(\boldsymbol{x}) e^{-\beta \mathcal{E}(\boldsymbol{x})}$ with unlimited capacity and data, and is connected to InfoNCE and classifier guidance in special cases. The method is instantiated in offline RL as QGPO, using in-support CEP and in-support Softmax Q-Learning to compute targets, and achieves strong results on D4RL benchmarks, especially in hard tasks like AntMaze. It is also demonstrated on image synthesis, where CEP performs comparably to classifier guidance on ImageNet and enables energy-guided control of color appearance, illustrating scalability to high-dimensional data. Overall, CEP provides a principled, exact framework for controllable diffusion sampling with broad practical impact across reinforcement learning and generative modeling.

Abstract

Guided sampling is a vital approach for applying diffusion models in real-world tasks that embeds human-defined guidance during the sampling procedure. This paper considers a general setting where the guidance is defined by an (unnormalized) energy function. The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate. To address this challenge, we propose an exact formulation of the intermediate guidance as well as a novel training objective named contrastive energy prediction (CEP) to learn the exact guidance. Our method is guaranteed to converge to the exact guidance under unlimited model capacity and data samples, while previous methods can not. We demonstrate the effectiveness of our method by applying it to offline reinforcement learning (RL). Extensive experiments on D4RL benchmarks demonstrate that our method outperforms existing state-of-the-art algorithms. We also provide some examples of applying CEP for image synthesis to demonstrate the scalability of CEP on high-dimensional data.

Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

TL;DR

This work addresses exact energy-guided diffusion sampling for unnormalized energy functions by deriving the exact intermediate energy

and its gradient, then training a neural proxy via Contrastive Energy Prediction (CEP) to estimate

. CEP guarantees convergence to the desired distribution

with unlimited capacity and data, and is connected to InfoNCE and classifier guidance in special cases. The method is instantiated in offline RL as QGPO, using in-support CEP and in-support Softmax Q-Learning to compute targets, and achieves strong results on D4RL benchmarks, especially in hard tasks like AntMaze. It is also demonstrated on image synthesis, where CEP performs comparably to classifier guidance on ImageNet and enables energy-guided control of color appearance, illustrating scalability to high-dimensional data. Overall, CEP provides a principled, exact framework for controllable diffusion sampling with broad practical impact across reinforcement learning and generative modeling.

Abstract

Paper Structure (65 sections, 4 theorems, 84 equations, 11 figures, 6 tables, 1 algorithm)

This paper contains 65 sections, 4 theorems, 84 equations, 11 figures, 6 tables, 1 algorithm.

Introduction
Background
Diffusion (Probabilistic) Models
Constrained Policy Optimization in Offline Reinforcement Learning
Exact Energy-Guided Sampling
Exact Formulation of Intermediate Energy Guidance
Learning Energy Guidance by Contrastive Energy Prediction
Comparison with Previous Methods for Guided Sampling
Previous Energy-Guided Samplers are Inexact
MSE for Predicting Energy.
Diffusion Posterior Sampling.
2-D Example.
Relationship with Contrastive Learning and Classifier Guidance
Q-Guided Policy Optimization for Offline Reinforcement Learning
Problem Formulation
...and 50 more sections

Key Result

Theorem 3.1

Suppose $q_0$ and $p_0$ are defined as in Eq. (Eq:target_distribution). For $t\in(0, T]$, let Denote $q_t({\bm{x}}_t):=\int q_{t0}({\bm{x}}_t|{\bm{x}}_0) q_0({\bm{x}}_0)\mathrm{d}{\bm{x}}_0$ and $p_t({\bm{x}}_t):=\int p_{t0}({\bm{x}}_t|{\bm{x}}_0) p_0({\bm{x}}_0)\mathrm{d}{\bm{x}}_0$ as the marginal distributions at time $t$, and define Then $q_t$ and $p_t$ satisfy and their score functions sat

Figures (11)

Figure 1: A 2-D mixtures-of-Gaussians example of the density functions (unnormalized) for $q_t({\bm{x}}_t)$, $e^{-{\mathcal{E}}_t({\bm{x}}_t)}$ and $p_t({\bm{x}}_t)$ during the diffusion process, where $p_t({\bm{x}}_t)\propto q_t({\bm{x}}_t)e^{-{\mathcal{E}}_t({\bm{x}}_t)}$.
Figure 2: A 2-D example for comparing different energy-guided sampling algorithms, varying different inverse temperature $\beta$.
Figure 3: Samples by color guidance with red, green, and blue, varying the guidance scale $s$ (under a fixed random seed).
Figure 4: Ablation of gradient scales in D4RL benchmark.
Figure 5: Ablation of diffusion steps in evaluation.
...and 6 more figures

Theorems & Definitions (8)

Theorem 3.1: Intermediate Energy Guidance
Theorem 3.2
Theorem 4.1: CEP in Multiple Time Steps
proof
proof
proof : Proof of \ref{['thrm:infoNCE_energy_mixed']}
Theorem 6.1
proof

Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

TL;DR

Abstract

Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (8)