Table of Contents
Fetching ...

MANGO: Learning Disentangled Image Transformation Manifolds with Grouped Operators

Brighton Ancelin, Yenho Chen, Peimeng Guan, Chiraag Kaushik, Belen Martin-Urcelay, Alex Saad-Falcon, Nakul Singh

TL;DR

MANGO tackles the challenge of learning semantically meaningful image transformations by enforcing disentanglement through grouped, block-diagonal operators that act on distinct latent subspaces. It introduces a one-phase training objective that jointly optimizes reconstruction and latent transformation consistency, enabling end-to-end learning and dramatically faster training (up to $100\times$ faster than prior MAE approaches and even up to $1500\times$ for certain settings). The approach yields composable, interpretable transformations that extrapolate beyond training data, with empirical evidence on MNIST showing improved disentanglement (higher MIG) and robust reconstruction. This framework offers a scalable, interpretable pathway for semantically meaningful transformation learning with practical impact for data augmentation and scientific visualization.

Abstract

Learning semantically meaningful image transformations (i.e. rotation, thickness, blur) directly from examples can be a challenging task. Recently, the Manifold Autoencoder (MAE) proposed using a set of Lie group operators to learn image transformations directly from examples. However, this approach has limitations, as the learned operators are not guaranteed to be disentangled and the training routine is prohibitively expensive when scaling up the model. To address these limitations, we propose MANGO (transformation Manifolds with Grouped Operators) for learning disentangled operators that describe image transformations in distinct latent subspaces. Moreover, our approach allows practitioners the ability to define which transformations they aim to model, thus improving the semantic meaning of the learned operators. Through our experiments, we demonstrate that MANGO enables composition of image transformations and introduces a one-phase training routine that leads to a 100x speedup over prior works.

MANGO: Learning Disentangled Image Transformation Manifolds with Grouped Operators

TL;DR

MANGO tackles the challenge of learning semantically meaningful image transformations by enforcing disentanglement through grouped, block-diagonal operators that act on distinct latent subspaces. It introduces a one-phase training objective that jointly optimizes reconstruction and latent transformation consistency, enabling end-to-end learning and dramatically faster training (up to faster than prior MAE approaches and even up to for certain settings). The approach yields composable, interpretable transformations that extrapolate beyond training data, with empirical evidence on MNIST showing improved disentanglement (higher MIG) and robust reconstruction. This framework offers a scalable, interpretable pathway for semantically meaningful transformation learning with practical impact for data augmentation and scientific visualization.

Abstract

Learning semantically meaningful image transformations (i.e. rotation, thickness, blur) directly from examples can be a challenging task. Recently, the Manifold Autoencoder (MAE) proposed using a set of Lie group operators to learn image transformations directly from examples. However, this approach has limitations, as the learned operators are not guaranteed to be disentangled and the training routine is prohibitively expensive when scaling up the model. To address these limitations, we propose MANGO (transformation Manifolds with Grouped Operators) for learning disentangled operators that describe image transformations in distinct latent subspaces. Moreover, our approach allows practitioners the ability to define which transformations they aim to model, thus improving the semantic meaning of the learned operators. Through our experiments, we demonstrate that MANGO enables composition of image transformations and introduces a one-phase training routine that leads to a 100x speedup over prior works.
Paper Structure (14 sections, 3 equations, 7 figures, 1 algorithm)

This paper contains 14 sections, 3 equations, 7 figures, 1 algorithm.

Figures (7)

  • Figure 1: Block diagram of MANGO. The model simultaneously learns a transport operator $\blA$, where each block diagonal component represents a semantically meaningful transformation, and an autoencoder that reconstructs the transformed image.
  • Figure 2: Augmentations with various combinations of rotations and thickness changes.
  • Figure 3: MANGO achieves a neatly disentangled latent space. The figure shows the magnitude of each coordinate in the first principal component for both models. MANGO exhibits stronger concentration and alignment with learned operator coordinates.
  • Figure 4: Training runtimes (in seconds) per batch (of size 64) for fixed dictionary size $M=8$ and for fixed latent dimension $L=32$.
  • Figure 5: Comparison of image transformations. MANGO transformations retain image identity unlike the AE.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Definition 1