Table of Contents
Fetching ...

RewardFlow: Generate Images by Optimizing What You Reward

Onkar Susladkar, Dong-Hwan Jang, Tushar Prakash, Adheesh Juvekar, Vedant Shah, Ayush Barik, Nabeel Bashir, Muntasir Wahed, Ritish Shrirao, Ismini Lourentzou

Abstract

We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward Langevin dynamics. RewardFlow unifies complementary differentiable rewards for semantic alignment, perceptual fidelity, localized grounding, object consistency, and human preference, and further introduces a differentiable VQA-based reward that provides fine-grained semantic supervision through language-vision reasoning. To coordinate these heterogeneous objectives, we design a prompt-aware adaptive policy that extracts semantic primitives from the instruction, infers edit intent, and dynamically modulates reward weights and step sizes throughout sampling. Across several image editing and compositional generation benchmarks, RewardFlow delivers state-of-the-art edit fidelity and compositional alignment.

RewardFlow: Generate Images by Optimizing What You Reward

Abstract

We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward Langevin dynamics. RewardFlow unifies complementary differentiable rewards for semantic alignment, perceptual fidelity, localized grounding, object consistency, and human preference, and further introduces a differentiable VQA-based reward that provides fine-grained semantic supervision through language-vision reasoning. To coordinate these heterogeneous objectives, we design a prompt-aware adaptive policy that extracts semantic primitives from the instruction, infers edit intent, and dynamically modulates reward weights and step sizes throughout sampling. Across several image editing and compositional generation benchmarks, RewardFlow delivers state-of-the-art edit fidelity and compositional alignment.

Paper Structure

This paper contains 19 sections, 18 equations, 16 figures, 6 tables, 1 algorithm.

Figures (16)

  • Figure 1: RewardFlow enables accurate, localized, inversion-free image editing and generation using multi-reward Langevin guidance.
  • Figure 2: Gradient localization of our differentiable rewards. We visualize the image-space gradient $\nabla_I R_{tot}(\cdot)$ for various edit prompts. Our proposed rewards prevent semantic leakage by concentrating the gradient precisely on target semantic regions, demonstrating the fine-grained spatial control enabled by RewardFlow.
  • Figure 3: Image editing qualitative comparison across diverse instruction types. RewardFlow produces edits that are both semantically accurate and spatially localized, while better preserving background structure, lighting, and identity compared to prior methods.
  • Figure 4: Text-to-image qualitative results. Across all prompts, RewardFlow produces images that exhibit higher alignment with the textual descriptions while also generating outputs with more visually appealing composition and aesthetics.
  • Figure 5: Gradient localization across reward combinations. Including all rewards concentrates gradients to accurate object contours and eliminates leakage.
  • ...and 11 more figures