Table of Contents
Fetching ...

RAC: Rectified Flow Auto Coder

Sen Fang, Yalin Feng, Yanxin Zhang, Dimitris N. Metaxas

TL;DR

A Rectified Flow Auto Coder inspired by Rectified Flow to replace the traditional VAE that achieves multi-step decoding by applying the decoder to flow timesteps and reduces parameter count by nearly 41%.

Abstract

In this paper, we propose a Rectified Flow Auto Coder (RAC) inspired by Rectified Flow to replace the traditional VAE: 1. It achieves multi-step decoding by applying the decoder to flow timesteps. Its decoding path is straight and correctable, enabling step-by-step refinement. 2. The model inherently supports bidirectional inference, where the decoder serves as the encoder through time reversal (hence Coder rather than encoder or decoder), reducing parameter count by nearly 41%. 3. This generative decoding method improves generation quality since the model can correct latent variables along the path, partially addressing the reconstruction--generation gap. Experiments show that RAC surpasses SOTA VAEs in both reconstruction and generation with approximately 70% lower computational cost.

RAC: Rectified Flow Auto Coder

TL;DR

A Rectified Flow Auto Coder inspired by Rectified Flow to replace the traditional VAE that achieves multi-step decoding by applying the decoder to flow timesteps and reduces parameter count by nearly 41%.

Abstract

In this paper, we propose a Rectified Flow Auto Coder (RAC) inspired by Rectified Flow to replace the traditional VAE: 1. It achieves multi-step decoding by applying the decoder to flow timesteps. Its decoding path is straight and correctable, enabling step-by-step refinement. 2. The model inherently supports bidirectional inference, where the decoder serves as the encoder through time reversal (hence Coder rather than encoder or decoder), reducing parameter count by nearly 41%. 3. This generative decoding method improves generation quality since the model can correct latent variables along the path, partially addressing the reconstruction--generation gap. Experiments show that RAC surpasses SOTA VAEs in both reconstruction and generation with approximately 70% lower computational cost.
Paper Structure (32 sections, 12 equations, 12 figures, 4 tables, 1 algorithm)

This paper contains 32 sections, 12 equations, 12 figures, 4 tables, 1 algorithm.

Figures (12)

  • Figure 1: The trajectory demonstration of RAC: Make the reconstruction task a condition generation task; Make the decoder the encoder; Make the single-step decoding and encoding a multi-step decoding and encoding.
  • Figure 1: RAC Training (one iteration)
  • Figure 2: Method Overview.(i) Training. To prevent latent space collapse, we freeze the VAE encoder and train only the RAC decoder; reverse-time inference then serves as encoding. (ii) State Construction. Extra channels beyond RGB are padded with 0.5, keeping the velocity field shape constant and ensuring bidirectional consistency. (iii) RAC Input. RAC takes time $t$ and the current state as input, driving the transition from latent initialization to the target image.
  • Figure 3: Reconstruction is Condition Generation: The previous reconstructions were more accurate because they could relatively approach the manifold. The previous generations predicted variables that were often some distance away from the manifold. This is part of the reason for the past differences in the performance of generation and reconstruction. However, our method theoretically aims for perfect reconstruction, and the multi-step decoding can correct the potential variables provided by Unet or DiT. Therefore, both reconstruction and generation are significantly superior to traditional VAEs.
  • Figure 4: Conceptual and empirical views of RAC generation trajectories. Left: a conceptual illustration of trajectory-based generation from latent space to image space. Right: sampled state trajectories projected into a 2D PCA space.
  • ...and 7 more figures