Table of Contents
Fetching ...

FAIL: Flow Matching Adversarial Imitation Learning for Image Generation

Yeyao Ma, Chen Li, Xiaosong Zhang, Han Hu, Weidi Xie

TL;DR

This work reframes post-training alignment of flow-based image generators as adversarial imitation learning, introducing FAIL to minimize policy-expert divergence without explicit rewards or preferences. It offers two gradient-based algorithms, FAIL-PD (differentiable, low-variance pathwise gradients through the ODE solver) and FAIL-PG (black-box policy gradient for discrete or constrained settings), and demonstrates strong gains with only 13k expert demonstrations on UniGen-Bench and DPG-Bench, while generalizing to discrete image and video generation. FAIL also acts as an effective regularizer to mitigate reward hacking when combined with reward-based objectives, and its framework robustly extends to discrete modalities and video synthesis. Overall, FAIL provides a versatile, data-efficient post-training paradigm that improves prompt following and aesthetic alignment, with practical implications for safer and more stable alignment of large-scale generative models.

Abstract

Post-training of flow matching models-aligning the output distribution with a high-quality target-is mathematically equivalent to imitation learning. While Supervised Fine-Tuning mimics expert demonstrations effectively, it cannot correct policy drift in unseen states. Preference optimization methods address this but require costly preference pairs or reward modeling. We propose Flow Matching Adversarial Imitation Learning (FAIL), which minimizes policy-expert divergence through adversarial training without explicit rewards or pairwise comparisons. We derive two algorithms: FAIL-PD exploits differentiable ODE solvers for low-variance pathwise gradients, while FAIL-PG provides a black-box alternative for discrete or computationally constrained settings. Fine-tuning FLUX with only 13,000 demonstrations from Nano Banana pro, FAIL achieves competitive performance on prompt following and aesthetic benchmarks. Furthermore, the framework generalizes effectively to discrete image and video generation, and functions as a robust regularizer to mitigate reward hacking in reward-based optimization. Code and data are available at https://github.com/HansPolo113/FAIL.

FAIL: Flow Matching Adversarial Imitation Learning for Image Generation

TL;DR

This work reframes post-training alignment of flow-based image generators as adversarial imitation learning, introducing FAIL to minimize policy-expert divergence without explicit rewards or preferences. It offers two gradient-based algorithms, FAIL-PD (differentiable, low-variance pathwise gradients through the ODE solver) and FAIL-PG (black-box policy gradient for discrete or constrained settings), and demonstrates strong gains with only 13k expert demonstrations on UniGen-Bench and DPG-Bench, while generalizing to discrete image and video generation. FAIL also acts as an effective regularizer to mitigate reward hacking when combined with reward-based objectives, and its framework robustly extends to discrete modalities and video synthesis. Overall, FAIL provides a versatile, data-efficient post-training paradigm that improves prompt following and aesthetic alignment, with practical implications for safer and more stable alignment of large-scale generative models.

Abstract

Post-training of flow matching models-aligning the output distribution with a high-quality target-is mathematically equivalent to imitation learning. While Supervised Fine-Tuning mimics expert demonstrations effectively, it cannot correct policy drift in unseen states. Preference optimization methods address this but require costly preference pairs or reward modeling. We propose Flow Matching Adversarial Imitation Learning (FAIL), which minimizes policy-expert divergence through adversarial training without explicit rewards or pairwise comparisons. We derive two algorithms: FAIL-PD exploits differentiable ODE solvers for low-variance pathwise gradients, while FAIL-PG provides a black-box alternative for discrete or computationally constrained settings. Fine-tuning FLUX with only 13,000 demonstrations from Nano Banana pro, FAIL achieves competitive performance on prompt following and aesthetic benchmarks. Furthermore, the framework generalizes effectively to discrete image and video generation, and functions as a robust regularizer to mitigate reward hacking in reward-based optimization. Code and data are available at https://github.com/HansPolo113/FAIL.
Paper Structure (34 sections, 9 equations, 5 figures, 10 tables, 2 algorithms)

This paper contains 34 sections, 9 equations, 5 figures, 10 tables, 2 algorithms.

Figures (5)

  • Figure 1: (a) We propose FAIL, an adversarial imitation learning framework for flow matching model. (b) With 13K limited data, FAIL significantly improved performance of FLUX baseline.
  • Figure 2: Convergence dynamics of FAIL variants. PG converges rapidly but suffers from collapse, PD shows long-term stability.
  • Figure 3: The visualization results of FAIL-PD in different training steps. FAIL-PD show consistence improvement and distribution alignment to the expert demonstrations.
  • Figure 4: FAIL-PG convergence rapidly, it improve the overall quality first, then optimize the fine-grained detail.
  • Figure 5: Integrate FAIL with reward model could alleviate the reward hacking phenomenon.