Disentangled Representation Learning via Flow Matching
Jinjin Chi, Taoping Liu, Mengtao Yin, Ximing Li, Yongcheng Jing, Dacheng Tao
TL;DR
This work introduces a flow-matching framework for disentangled representation learning by casting disentanglement as learning factor-conditioned flows in a latent space. It decomposes the latent transport velocity into factor-specific components and enforces semantic alignment through an orthogonality regularizer implemented via an output-attention mechanism, enabling non-overlapping, factorwise transformations. Empirical results across Cars3D, Shapes3D, MPI3D-toy, and CelebA show substantial improvements in disentanglement metrics (e.g., FactorVAE score and DCI), better controllability, and competitive sample fidelity compared to VAE-, GAN-, and diffusion-based baselines. The approach provides a deterministic, geometry-driven alternative to stochastic diffusion, with practical benefits in downstream task efficiency and semantic editing. Overall, the method advances disentangled representation learning by aligning factor semantics with latent transport dynamics, achieving reliable factor-level control.
Abstract
Disentangled representation learning aims to capture the underlying explanatory factors of observed data, enabling a principled understanding of the data-generating process. Recent advances in generative modeling have introduced new paradigms for learning such representations. However, existing diffusion-based methods encourage factor independence via inductive biases, yet frequently lack strong semantic alignment. In this work, we propose a flow matching-based framework for disentangled representation learning, which casts disentanglement as learning factor-conditioned flows in a compact latent space. To enforce explicit semantic alignment, we introduce a non-overlap (orthogonality) regularizer that suppresses cross-factor interference and reduces information leakage between factors. Extensive experiments across multiple datasets demonstrate consistent improvements over representative baselines, yielding higher disentanglement scores as well as improved controllability and sample fidelity.
