Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment
Yang Chen, Xiaowei Xu, Shuai Wang, Chenhui Zhu, Ruxue Wen, Xubin Li, Tiezheng Ge, Limin Wang
TL;DR
The paper tackles the gap between semantic representation and generative quality in normalizing flows by introducing reverse representation alignment (R-REPA), which exploits NF invertibility to align intermediate features along the generative path with a pretrained vision encoder. It also proposes a training-free, test-time classification method to probe the NF’s semantic knowledge, and extends the approach to latent-space generation via a VAE backbone for high-resolution synthesis. Through extensive ablations and experiments on ImageNet at 64×64 and 256×256, R-REPA yields state-of-the-art NF performance, accelerates training by over 3×, and achieves superior FID, sFID, and classification accuracy compared to strong baselines. The method demonstrates robustness across encoders and scales to high resolutions with efficient two-step sampling, establishing a principled, invertibility-aware route to higher-fidelity flow-based generation. Code is released for reproducibility and further exploration.
Abstract
Normalizing Flows (NFs) are a class of generative models distinguished by a mathematically invertible architecture, where the forward pass transforms data into a latent space for density estimation, and the reverse pass generates new samples from this space. This characteristic creates an intrinsic synergy between representation learning and data generation. However, the generative quality of standard NFs is limited by poor semantic representations from log-likelihood optimization. To remedy this, we propose a novel alignment strategy that creatively leverages the invertibility of NFs: instead of regularizing the forward pass, we align the intermediate features of the generative (reverse) pass with representations from a powerful vision foundation model, demonstrating superior effectiveness over naive alignment. We also introduce a novel training-free, test-time optimization algorithm for classification, which provides a more intrinsic evaluation of the NF's embedded semantic knowledge. Comprehensive experiments demonstrate that our approach accelerates the training of NFs by over 3.3$\times$, while simultaneously delivering significant improvements in both generative quality and classification accuracy. New state-of-the-art results for NFs are established on ImageNet 64$\times$64 and 256$\times$256. Our code is available at https://github.com/MCG-NJU/FlowBack.
