Table of Contents
Fetching ...

Interaction Asymmetry: A General Principle for Learning Composable Abstractions

Jack Brady, Julius von Kügelgen, Sébastien Lachapelle, Simon Buchholz, Thomas Kipf, Wieland Brendel

TL;DR

On synthetic image datasets consisting of objects, this model can achieve comparable object disentanglement to existing models that use more explicit object-centric priors, and is proposed using a flexible Transformer-based VAE with a novel regularizer on the attention weights of the decoder.

Abstract

Learning disentangled representations of concepts and re-composing them in unseen ways is crucial for generalizing to out-of-domain situations. However, the underlying properties of concepts that enable such disentanglement and compositional generalization remain poorly understood. In this work, we propose the principle of interaction asymmetry which states: "Parts of the same concept have more complex interactions than parts of different concepts". We formalize this via block diagonality conditions on the $(n+1)$th order derivatives of the generator mapping concepts to observed data, where different orders of "complexity" correspond to different $n$. Using this formalism, we prove that interaction asymmetry enables both disentanglement and compositional generalization. Our results unify recent theoretical results for learning concepts of objects, which we show are recovered as special cases with $n\!=\!0$ or $1$. We provide results for up to $n\!=\!2$, thus extending these prior works to more flexible generator functions, and conjecture that the same proof strategies generalize to larger $n$. Practically, our theory suggests that, to disentangle concepts, an autoencoder should penalize its latent capacity and the interactions between concepts during decoding. We propose an implementation of these criteria using a flexible Transformer-based VAE, with a novel regularizer on the attention weights of the decoder. On synthetic image datasets consisting of objects, we provide evidence that this model can achieve comparable object disentanglement to existing models that use more explicit object-centric priors.

Interaction Asymmetry: A General Principle for Learning Composable Abstractions

TL;DR

On synthetic image datasets consisting of objects, this model can achieve comparable object disentanglement to existing models that use more explicit object-centric priors, and is proposed using a flexible Transformer-based VAE with a novel regularizer on the attention weights of the decoder.

Abstract

Learning disentangled representations of concepts and re-composing them in unseen ways is crucial for generalizing to out-of-domain situations. However, the underlying properties of concepts that enable such disentanglement and compositional generalization remain poorly understood. In this work, we propose the principle of interaction asymmetry which states: "Parts of the same concept have more complex interactions than parts of different concepts". We formalize this via block diagonality conditions on the th order derivatives of the generator mapping concepts to observed data, where different orders of "complexity" correspond to different . Using this formalism, we prove that interaction asymmetry enables both disentanglement and compositional generalization. Our results unify recent theoretical results for learning concepts of objects, which we show are recovered as special cases with or . We provide results for up to , thus extending these prior works to more flexible generator functions, and conjecture that the same proof strategies generalize to larger . Practically, our theory suggests that, to disentangle concepts, an autoencoder should penalize its latent capacity and the interactions between concepts during decoding. We propose an implementation of these criteria using a flexible Transformer-based VAE, with a novel regularizer on the attention weights of the decoder. On synthetic image datasets consisting of objects, we provide evidence that this model can achieve comparable object disentanglement to existing models that use more explicit object-centric priors.

Paper Structure

This paper contains 40 sections, 23 theorems, 179 equations, 7 figures, 1 table.

Key Result

Theorem 4.2

Let $n \in \{0, 1, 2\}$. Let $\bm{f}: \mathcal{Z} \rightarrow \mathcal{X}$ be a $C^{n+1}$ diffeomorphism satisfying interaction asymmetry (as:interac_asym) for all equivalent generators (def:equiv-gen) and sufficient independence (sec:suff_indep). Let $\mathcal{Z}_\textnormal{supp}$ be regular close

Figures (7)

  • Figure 1: Illustration of Interaction Asymmetry.(Left) Observations $\bm{x}$ result from a generator $\bm{f}$ applied to latent slots $\bm{z}_{B_k}$ that represent separate concepts. As indicated by the reflection of the cylinder upon the cube, slots can interact during generation. Our key assumption, interaction asymmetry, states that these interactions across slots must be less complex than interactions within the same slot. (Right) This is formalized by assuming block-diagonality across but not within slots for the $(n\!+\!1)$th order derivatives of the generator, i.e., $D^{n+1}\bm{f}$.
  • Figure 2: See intuition for Theorem \ref{['theo:decoder_generalization']}.
  • Figure 3: (A) Sprites Normalized slot-wise Jacobians for an unregularized ($\alpha=0,\beta=0$) and a regularized ($\alpha>0,\beta>0$) Transformer and a Spatial Broadcast Decoder (SBD). The unregularized model encodes objects across multiple slots, while the regularized model matches the disentanglement of the SBD. (B) CLEVR6 Slot-wise Jacobians for a regularized Transformer and a SBD on objects in CLEVR6 which interact via reflections. As can be seen in reconstructions and Jacobians, the regularized Transformer models reflections, while mostly removing unnecessary interactions, while the SBD fails to model reflections due to its restricted architecture.
  • Figure 4: PyTorch code to compute $\mathcal{L}_{\text{interact}}$.
  • Figure 5: Analysis of $\mathcal{L}_{\text{interact}}$ when using a VAE loss. We plot $\mathcal{L}_{\text{interact}}$ for the first 400,000 training iterations for a Transformer autoencoder trained without regularization ($\alpha\!=\!0, \beta\!=\!0$) and with a VAE loss which does not explicitly optimize $\mathcal{L}_{\text{interact}}$ ($\alpha\!=\!0, \beta\!=\!0.05$).
  • ...and 2 more figures

Theorems & Definitions (51)

  • Definition 2.1: Disentanglement
  • Definition 2.2: Compositional Generalization
  • Definition 3.2: At most $0$th order/No interaction
  • Definition 3.3: At most $1$st order interaction
  • Definition 3.4: At most $n$th order interaction
  • Definition 4.1: Equivalent Generators
  • Definition 4.1: Sufficient Independence ( Order)
  • Theorem 4.2: Disentanglement on $\Zsupp$
  • Theorem 4.3: Compositional Generalization
  • Definition A.1: $C^k$-diffeomorphism
  • ...and 41 more