MANGO: Learning Disentangled Image Transformation Manifolds with Grouped Operators
Brighton Ancelin, Yenho Chen, Peimeng Guan, Chiraag Kaushik, Belen Martin-Urcelay, Alex Saad-Falcon, Nakul Singh
TL;DR
MANGO tackles the challenge of learning semantically meaningful image transformations by enforcing disentanglement through grouped, block-diagonal operators that act on distinct latent subspaces. It introduces a one-phase training objective that jointly optimizes reconstruction and latent transformation consistency, enabling end-to-end learning and dramatically faster training (up to $100\times$ faster than prior MAE approaches and even up to $1500\times$ for certain settings). The approach yields composable, interpretable transformations that extrapolate beyond training data, with empirical evidence on MNIST showing improved disentanglement (higher MIG) and robust reconstruction. This framework offers a scalable, interpretable pathway for semantically meaningful transformation learning with practical impact for data augmentation and scientific visualization.
Abstract
Learning semantically meaningful image transformations (i.e. rotation, thickness, blur) directly from examples can be a challenging task. Recently, the Manifold Autoencoder (MAE) proposed using a set of Lie group operators to learn image transformations directly from examples. However, this approach has limitations, as the learned operators are not guaranteed to be disentangled and the training routine is prohibitively expensive when scaling up the model. To address these limitations, we propose MANGO (transformation Manifolds with Grouped Operators) for learning disentangled operators that describe image transformations in distinct latent subspaces. Moreover, our approach allows practitioners the ability to define which transformations they aim to model, thus improving the semantic meaning of the learned operators. Through our experiments, we demonstrate that MANGO enables composition of image transformations and introduces a one-phase training routine that leads to a 100x speedup over prior works.
