RAC: Rectified Flow Auto Coder

Sen Fang; Yalin Feng; Yanxin Zhang; Dimitris N. Metaxas

RAC: Rectified Flow Auto Coder

Sen Fang, Yalin Feng, Yanxin Zhang, Dimitris N. Metaxas

TL;DR

A Rectified Flow Auto Coder inspired by Rectified Flow to replace the traditional VAE that achieves multi-step decoding by applying the decoder to flow timesteps and reduces parameter count by nearly 41%.

Abstract

In this paper, we propose a Rectified Flow Auto Coder (RAC) inspired by Rectified Flow to replace the traditional VAE: 1. It achieves multi-step decoding by applying the decoder to flow timesteps. Its decoding path is straight and correctable, enabling step-by-step refinement. 2. The model inherently supports bidirectional inference, where the decoder serves as the encoder through time reversal (hence Coder rather than encoder or decoder), reducing parameter count by nearly 41%. 3. This generative decoding method improves generation quality since the model can correct latent variables along the path, partially addressing the reconstruction--generation gap. Experiments show that RAC surpasses SOTA VAEs in both reconstruction and generation with approximately 70% lower computational cost.

RAC: Rectified Flow Auto Coder

TL;DR

Abstract

Paper Structure (32 sections, 12 equations, 12 figures, 4 tables, 1 algorithm)

This paper contains 32 sections, 12 equations, 12 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Flow-Based Generative Models
Variational Autoencoders and Representation Learning
Unified Modeling of Generation and Reconstruction
Methodology
Overview and Goal
Time-Conditioned Rectified Flow Decoder
State Construction and Transition
Training Objectives
Algorithm and Implementation Details
Default settings.
Experiments
Setup.
Reconstruction Quality
...and 17 more sections

Figures (12)

Figure 1: The trajectory demonstration of RAC: Make the reconstruction task a condition generation task; Make the decoder the encoder; Make the single-step decoding and encoding a multi-step decoding and encoding.
Figure 1: RAC Training (one iteration)
Figure 2: Method Overview.(i) Training. To prevent latent space collapse, we freeze the VAE encoder and train only the RAC decoder; reverse-time inference then serves as encoding. (ii) State Construction. Extra channels beyond RGB are padded with 0.5, keeping the velocity field shape constant and ensuring bidirectional consistency. (iii) RAC Input. RAC takes time $t$ and the current state as input, driving the transition from latent initialization to the target image.
Figure 3: Reconstruction is Condition Generation: The previous reconstructions were more accurate because they could relatively approach the manifold. The previous generations predicted variables that were often some distance away from the manifold. This is part of the reason for the past differences in the performance of generation and reconstruction. However, our method theoretically aims for perfect reconstruction, and the multi-step decoding can correct the potential variables provided by Unet or DiT. Therefore, both reconstruction and generation are significantly superior to traditional VAEs.
Figure 4: Conceptual and empirical views of RAC generation trajectories. Left: a conceptual illustration of trajectory-based generation from latent space to image space. Right: sampled state trajectories projected into a 2D PCA space.
...and 7 more figures

RAC: Rectified Flow Auto Coder

TL;DR

Abstract

RAC: Rectified Flow Auto Coder

Authors

TL;DR

Abstract

Table of Contents

Figures (12)