Table of Contents
Fetching ...

Self-Reflective Reinforcement Learning for Diffusion-based Image Reasoning Generation

Jiadong Pan, Zhiyuan Ma, Kaiyan Zhang, Ning Ding, Bowen Zhou

TL;DR

This work tackles the challenge of generating image content that adheres to physical laws and can exhibit reasoning, by introducing SRRL, a self-reflective reinforcement learning framework for diffusion models. SRRL combines multi-round reflective denoising with a condition guided forward process, enabling iterative reasoning across generation trajectories and leveraging process reward models for intermediate feedback. The approach is formalized via $\mathcal{J}_{RL}(\theta)$ and $\mathcal{J}_{SRRL}(\theta)$ objectives, with PPO optimization guiding updates and a forward-noise mechanism linking successive rounds. Empirical results on physical-law and unconventional-phenomena prompts show SRRL produces high-quality, logically coherent images that rival or surpass GPT-4o, suggesting significant potential for education, science visualization, and creative content generation where reasoning and imagination are crucial.

Abstract

Diffusion models have recently demonstrated exceptional performance in image generation task. However, existing image generation methods still significantly suffer from the dilemma of image reasoning, especially in logic-centered image generation tasks. Inspired by the success of Chain of Thought (CoT) and Reinforcement Learning (RL) in LLMs, we propose SRRL, a self-reflective RL algorithm for diffusion models to achieve reasoning generation of logical images by performing reflection and iteration across generation trajectories. The intermediate samples in the denoising process carry noise, making accurate reward evaluation difficult. To address this challenge, SRRL treats the entire denoising trajectory as a CoT step with multi-round reflective denoising process and introduces condition guided forward process, which allows for reflective iteration between CoT steps. Through SRRL-based iterative diffusion training, we introduce image reasoning through CoT into generation tasks adhering to physical laws and unconventional physical phenomena for the first time. Notably, experimental results of case study exhibit that the superior performance of our SRRL algorithm even compared with GPT-4o. The project page is https://jadenpan0.github.io/srrl.github.io/.

Self-Reflective Reinforcement Learning for Diffusion-based Image Reasoning Generation

TL;DR

This work tackles the challenge of generating image content that adheres to physical laws and can exhibit reasoning, by introducing SRRL, a self-reflective reinforcement learning framework for diffusion models. SRRL combines multi-round reflective denoising with a condition guided forward process, enabling iterative reasoning across generation trajectories and leveraging process reward models for intermediate feedback. The approach is formalized via and objectives, with PPO optimization guiding updates and a forward-noise mechanism linking successive rounds. Empirical results on physical-law and unconventional-phenomena prompts show SRRL produces high-quality, logically coherent images that rival or surpass GPT-4o, suggesting significant potential for education, science visualization, and creative content generation where reasoning and imagination are crucial.

Abstract

Diffusion models have recently demonstrated exceptional performance in image generation task. However, existing image generation methods still significantly suffer from the dilemma of image reasoning, especially in logic-centered image generation tasks. Inspired by the success of Chain of Thought (CoT) and Reinforcement Learning (RL) in LLMs, we propose SRRL, a self-reflective RL algorithm for diffusion models to achieve reasoning generation of logical images by performing reflection and iteration across generation trajectories. The intermediate samples in the denoising process carry noise, making accurate reward evaluation difficult. To address this challenge, SRRL treats the entire denoising trajectory as a CoT step with multi-round reflective denoising process and introduces condition guided forward process, which allows for reflective iteration between CoT steps. Through SRRL-based iterative diffusion training, we introduce image reasoning through CoT into generation tasks adhering to physical laws and unconventional physical phenomena for the first time. Notably, experimental results of case study exhibit that the superior performance of our SRRL algorithm even compared with GPT-4o. The project page is https://jadenpan0.github.io/srrl.github.io/.

Paper Structure

This paper contains 31 sections, 13 equations, 12 figures, 1 table, 2 algorithms.

Figures (12)

  • Figure 1: Illustration of self-reflective reasoning step. Through self-reflective processes of repeated denoising and re-noising, diffusion models achieve image reasoning generation adhered to physical laws and counterintuitive physical phenomena.
  • Figure 2: Overview of SRRL. SRRL includes two processes: multi-round reflective denoising process and condition guided forward process. These two processes are repeated for $K$ rounds.
  • Figure 3: Reasoning generation of images related to physical laws.
  • Figure 4: Reasoning generation of images related to unconventional physical phenomena.
  • Figure 5: Reasoning generation process of the prompt related to a balance. Initially, the model generates an image of a balance tilted left without objects or tilted right with lighter objects on the left and heavier ones on the right, both following physical laws. Eventually, it learns to create images defying logic: a balance tilts left with no objects on the left and a small ball on the right.
  • ...and 7 more figures