Energy-Weighted Flow Matching for Offline Reinforcement Learning
Shiyuan Zhang, Weitong Zhang, Quanquan Gu
TL;DR
The paper addresses the challenge of energy-guided generation, where the target distribution $q(\mathbf{x})$ is shaped by an energy function as $q(\mathbf{x}) \propto p(\mathbf{x}) \exp(-\beta \mathcal{E}(\mathbf{x}))$. It introduces Energy-Weighted Flow Matching (EFM) and Energy-Weighted Diffusion (ED) to directly learn energy-guided flows and diffusion processes without auxiliary models, backed by theoretical guarantees that these methods reproduce the energy-guided distribution. It extends these ideas to offline reinforcement learning via Q-weighted Iterative Policy Optimization (QIPO), combining energy-guided sampling with iterative policy refinement to improve performance on D4RL benchmarks, and demonstrates faster sampling relative to some baselines while maintaining or improving effectiveness. The work provides a first exact energy-guided flow matching model and a diffusion model that directly incorporates energy guidance, enabling simpler, more accurate control of generative outcomes and impactful applications across domains such as image synthesis, molecular design, and offline RL. Overall, the proposed framework reduces modeling complexity, improves guided generation, and offers a practical pathway to incorporate energy-based objectives into diffusion and flow-based generative models.
Abstract
This paper investigates energy guidance in generative modeling, where the target distribution is defined as $q(\mathbf x) \propto p(\mathbf x)\exp(-β\mathcal E(\mathbf x))$, with $p(\mathbf x)$ being the data distribution and $\mathcal E(\mathcal x)$ as the energy function. To comply with energy guidance, existing methods often require auxiliary procedures to learn intermediate guidance during the diffusion process. To overcome this limitation, we explore energy-guided flow matching, a generalized form of the diffusion process. We introduce energy-weighted flow matching (EFM), a method that directly learns the energy-guided flow without the need for auxiliary models. Theoretical analysis shows that energy-weighted flow matching accurately captures the guided flow. Additionally, we extend this methodology to energy-weighted diffusion models and apply it to offline reinforcement learning (RL) by proposing the Q-weighted Iterative Policy Optimization (QIPO). Empirically, we demonstrate that the proposed QIPO algorithm improves performance in offline RL tasks. Notably, our algorithm is the first energy-guided diffusion model that operates independently of auxiliary models and the first exact energy-guided flow matching model in the literature.
