AdaGen: Learning Adaptive Policy for Image Synthesis

Zanlin Ni; Yulin Wang; Yeguo Hua; Renping Zhou; Jiayi Guo; Jun Song; Bo Zheng; Gao Huang

AdaGen: Learning Adaptive Policy for Image Synthesis

Zanlin Ni, Yulin Wang, Yeguo Hua, Renping Zhou, Jiayi Guo, Jun Song, Bo Zheng, Gao Huang

Abstract

Recent advances in image synthesis have been propelled by powerful generative models, such as Masked Generative Transformers (MaskGIT), autoregressive models, diffusion models, and rectified flow models. A common principle behind their success is the decomposition of synthesis into multiple steps. However, this introduces a proliferation of step-specific parameters (e.g., noise level or temperature at each step). Existing approaches typically rely on manually-designed rules to manage this complexity, demanding expert knowledge and trial-and-error. Furthermore, these static schedules lack the flexibility to adapt to the unique characteristics of each sample, yielding sub-optimal performance. To address this issue, we present AdaGen, a general, learnable, and sample-adaptive framework for scheduling the iterative generation process. Specifically, we formulate the scheduling problem as a Markov Decision Process, where a lightweight policy network determines suitable parameters given the current generation state, and can be trained through reinforcement learning. Importantly, we demonstrate that simple reward designs, such as FID or pre-trained reward models, can be easily hacked and may not reliably guarantee the desired quality or diversity of generated samples. Therefore, we propose an adversarial reward design to guide the training of the policy networks. Finally, we introduce an inference-time refinement strategy and a controllable fidelity-diversity trade-off mechanism to further enhance the performance and flexibility of AdaGen. Comprehensive experiments on four generative paradigms validate the superiority of AdaGen. For example, AdaGen achieves better performance on DiT-XL with 3 times lower inference cost and improves the FID of VAR from 1.92 to 1.59 with negligible computational overhead.

AdaGen: Learning Adaptive Policy for Image Synthesis

Abstract

Paper Structure (61 sections, 16 equations, 19 figures, 16 tables, 2 algorithms)

This paper contains 61 sections, 16 equations, 19 figures, 16 tables, 2 algorithms.

Introduction
Related Work
Iterative Generation Models
Reinforcement Learning in Image Generation.
Generative Adversarial Networks and RL.
Proper Scheduling for Generative Models.
Method
Preliminaries
Masked Generative Transformers
Autoregressive Models
Diffusion Models
Rectified Flow Models
Motivation
Motivation I: Automatic Policy Acquisition.
Motivation II: Adaptive Policy Adjustment.
...and 46 more sections

Figures (19)

Figure 1: Main idea of AdaGen. Existing multi-step generative models, such as diffusion models, typically utilize pre-defined, static schedules to configure the generation policy (e.g., noise level) of all samples. Instead, AdaGen leverages reinforcement learning (RL) to train a policy network that directly learns the optimal generation policies adaptively tailored for each sample.
Figure 2: Training pipeline of AdaGen with adversarial reward modeling. The process involves an adversarial interplay: the policy network is optimized to maximize a reward signal, while the reward model is concurrently trained to distinguish real from generated images. The reward signal is the probability of a sample being deemed real by the reward model. Notably, the pre-trained generative model stays frozen throughout the pipeline.
Figure 3: Reward Design. (a) Using FID salimans2016improved as the reward. (b) Using a pre-trained reward model (PRM) xu2024imagereward. (c) Our main approach with adversarial reward modeling.
Figure 4: Optimization process of AdaGen. When the generation steps increase from $T=8$ to $T=32$, the optimization becomes unstable and yields worse performance. Our proposed action smoothing technique stabilizes convergence and achieves better performance ($T=32$ (Stabilized)).
Figure 5: Action sequence analysis. Left: Learned action sequences at $T = 32$ for three random samples exhibit erratic fluctuations. Right: A simple linear interpolation between the start and end points of the original learned action sequence yields slightly better performance. We visualize sampling temperature $\tau_t$ as an example of the action sequence.
...and 14 more figures

AdaGen: Learning Adaptive Policy for Image Synthesis

Abstract

AdaGen: Learning Adaptive Policy for Image Synthesis

Authors

Abstract

Table of Contents

Figures (19)