MixerFlow: MLP-Mixer meets Normalising Flows
Eshant English, Matthias Kirchler, Christoph Lippert
TL;DR
This paper addresses density estimation and data generation for images using normalising flows, noting limitations of Glow-based backbones in expressivity and parameter efficiency. It introduces MixerFlow, an invertible architecture based on the discriminative MLP-Mixer that employs channel-mixing and patch-mixing flows with weight sharing, a shift layer for boundary interactions, and ActNorm initialisation to stabilise training. Across 32×32 and 64×64 image benchmarks, MixerFlow achieves competitive or superior negative log-likelihood (bits-per-dimension) while using fewer parameters, and demonstrates robust performance under permutations and in hybrid modelling with MAF. The approach shows versatility by enabling integration of splines and Kolmogorov-Arnold Networks (KAN) and suggests strong potential for scalable, informative representations and downstream tasks, with planned future work on multiscale designs and stronger inductive biases.
Abstract
Normalising flows are generative models that transform a complex density into a simpler density through the use of bijective transformations enabling both density estimation and data generation from a single model. %However, the requirement for bijectivity imposes the use of specialised architectures. In the context of image modelling, the predominant choice has been the Glow-based architecture, whereas alternative architectures remain largely unexplored in the research community. In this work, we propose a novel architecture called MixerFlow, based on the MLP-Mixer architecture, further unifying the generative and discriminative modelling architectures. MixerFlow offers an efficient mechanism for weight sharing for flow-based models. Our results demonstrate comparative or superior density estimation on image datasets and good scaling as the image resolution increases, making MixerFlow a simple yet powerful alternative to the Glow-based architectures. We also show that MixerFlow provides more informative embeddings than Glow-based architectures and can integrate many structured transformations such as splines or Kolmogorov-Arnold Networks.
