Integrating Reinforcement Learning with Visual Generative Models: Foundations and Advances
Yuanzhi Liang, Yijie Fang, Rui Li, Ziqi Ni, Ruijie Su, Chi Zhang
TL;DR
This survey addresses the misalignment between common surrogate objectives and perceptual, semantic, and physical realism in visual generation. It positions reinforcement learning as a principled framework for optimizing non-differentiable, preference-driven, and temporally structured objectives, and organizes contemporary advances across image, video, and 3D generation. Key contributions include a structured account of RL’s evolution, a taxonomy of RL-enhanced generation methods (PPO-based, DPO-based, GRPO-based), and insights into mechanisms, human-alignment strategies, and world-model integration. The work highlights the practical impact of RL in improving controllability, temporal consistency, and human-aligned quality, while outlining open challenges and promising directions for future research at the intersection of RL and visual generative modeling.
Abstract
Generative models have made significant progress in synthesizing visual content, including images, videos, and 3D/4D structures. However, they are typically trained with surrogate objectives such as likelihood or reconstruction loss, which often misalign with perceptual quality, semantic accuracy, or physical realism. Reinforcement learning (RL) offers a principled framework for optimizing non-differentiable, preference-driven, and temporally structured objectives. Recent advances demonstrate its effectiveness in enhancing controllability, consistency, and human alignment across generative tasks. This survey provides a systematic overview of RL-based methods for visual content generation. We review the evolution of RL from classical control to its role as a general-purpose optimization tool, and examine its integration into image, video, and 3D/4D generation. Across these domains, RL serves not only as a fine-tuning mechanism but also as a structural component for aligning generation with complex, high-level goals. We conclude with open challenges and future research directions at the intersection of RL and generative modeling.
