Table of Contents
Fetching ...

Measuring Orthogonality in Representations of Generative Models

Robin C. Geyer, Alessandro Torcinovich, João B. Carvalho, Alexander Meyer, Joachim M. Buhmann

TL;DR

These findings suggest that representation quality is closer related to the orthogonality of independent generative processes rather than their disentanglement, offering a new direction for evaluating and improving unsupervised learning models.

Abstract

In unsupervised representation learning, models aim to distill essential features from high-dimensional data into lower-dimensional learned representations, guided by inductive biases. Understanding the characteristics that make a good representation remains a topic of ongoing research. Disentanglement of independent generative processes has long been credited with producing high-quality representations. However, focusing solely on representations that adhere to the stringent requirements of most disentanglement metrics, may result in overlooking many high-quality representations, well suited for various downstream tasks. These metrics often demand that generative factors be encoded in distinct, single dimensions aligned with the canonical basis of the representation space. Motivated by these observations, we propose two novel metrics: Importance-Weighted Orthogonality (IWO) and Importance-Weighted Rank (IWR). These metrics evaluate the mutual orthogonality and rank of generative factor subspaces. Throughout extensive experiments on common downstream tasks, over several benchmark datasets and models, IWO and IWR consistently show stronger correlations with downstream task performance than traditional disentanglement metrics. Our findings suggest that representation quality is closer related to the orthogonality of independent generative processes rather than their disentanglement, offering a new direction for evaluating and improving unsupervised learning models.

Measuring Orthogonality in Representations of Generative Models

TL;DR

These findings suggest that representation quality is closer related to the orthogonality of independent generative processes rather than their disentanglement, offering a new direction for evaluating and improving unsupervised learning models.

Abstract

In unsupervised representation learning, models aim to distill essential features from high-dimensional data into lower-dimensional learned representations, guided by inductive biases. Understanding the characteristics that make a good representation remains a topic of ongoing research. Disentanglement of independent generative processes has long been credited with producing high-quality representations. However, focusing solely on representations that adhere to the stringent requirements of most disentanglement metrics, may result in overlooking many high-quality representations, well suited for various downstream tasks. These metrics often demand that generative factors be encoded in distinct, single dimensions aligned with the canonical basis of the representation space. Motivated by these observations, we propose two novel metrics: Importance-Weighted Orthogonality (IWO) and Importance-Weighted Rank (IWR). These metrics evaluate the mutual orthogonality and rank of generative factor subspaces. Throughout extensive experiments on common downstream tasks, over several benchmark datasets and models, IWO and IWR consistently show stronger correlations with downstream task performance than traditional disentanglement metrics. Our findings suggest that representation quality is closer related to the orthogonality of independent generative processes rather than their disentanglement, offering a new direction for evaluating and improving unsupervised learning models.
Paper Structure (48 sections, 4 theorems, 17 equations, 7 figures, 8 tables)

This paper contains 48 sections, 4 theorems, 17 equations, 7 figures, 8 tables.

Key Result

Theorem B.1

Given a representation ${\bm{c}} \in \mathbb{R}^L$, $\overline{\text{IWO}}\xspace = 1$ if and only if the latent subspaces $\mathbb{S}_1, \dots, \mathbb{S}_K$ of the generative factors ${z}_1, \dots, {z}_K$ all lie in each other's orthogonal complement.

Figures (7)

  • Figure 1: Three configurations of data (circles) encoded in a $2$-d learned latent space. The data is characterized by size and (grayscale) color factors. The blue axes represent the direction of change in the factors. (i) The factors are aligned with the basis of the space, corresponding to perfect disentanglement and perfect orthogonality. (ii) The factors are not aligned but orthogonal, corresponding to complete entanglement, but still perfect orthogonality. (iii) The factors are not orthogonal and some circle configurations are not encoded, however, disentanglement is higher than in (ii), because of partial alignment with the basis. Despite its complete entanglement, we argue that latent space (ii) is just as well suited as (i) for common downstream tasks, while (iii) is not.
  • Figure 2: Overview of GCA. I - Subspace Learning: Through iterative multiplications with ${\bm{W}}_l \in \mathbb{R}^{l \times (l + 1)}$, the input is projected to subspaces of decreasing dimensionality. The resulting outputs ${\bm{w}}_d$ are directed into NN heads, trained to minimize the expected loss terms $\mathcal{L}_{l}$. The importance $\alpha_l$ is gauged by the loss decrease between consecutive NNs heads. II - Basis Generation: The least important dimension, ${\bm{b}}_4$, corresponds to the null space of ${\bm{W}}_{3}$. For each subsequent dimension ${\bm{b}}_{l}$, the composed projection matrix $\hat{{\bm{W}}}_{l-1} = {\bm{W}}_{l-1} \cdot \dots \cdot {\bm{W}}_{3}$ is computed. ${\bm{b}}_{l}$ then corresponds to the dimension in the null space of $\hat{{\bm{W}}}_l$ which is orthogonal to all previously found basis vectors. Finally, ${\bm{b}}_{1}$ is retrieved by normalizing and transposing $\hat{{\bm{W}}}_{1}$.
  • Figure 3: Samples from a $\beta$-VAE trained on Shapes3D dataset. In each row, the same latent code is modulated along a different dimension. The reconstruction of the resulting latent codes through the decoder network is depicted. Left: The dimensions of modulation correspond to the most important generative components of each generative factor as found by GCA. We attest that modulation along the generative components indeed predominantly varies the respective generative factor. Right: The dimension of modulation corresponds to the most important dimensions for each generative factor as found by DCI. However, modulation along the dimension supposedly encoding floor color also changes azimuth and vice versa. Modulation along the dimension supposedly encoding wall color also changes floor color. The same form of entanglement goes for most other dimensions identified through the DCI framework.
  • Figure 4: Four configurations of a $3$-dimensional latent space. The planes represent the latent subspace where generative factors ${z}_1$, ${z}_2$ lie. The color mapping on each subspace represents the relationship between the generative factor and the latent components (e.g., blue indicating large values for ${z}_1$, red indicating low ones). Cases (i) and (ii) are characterized by a good explicitness score as both subspaces encode ${z}_1$ and ${z}_2$ as simple quadratic functions, contrary to cases (iii) and (iv) where the relationship is trigonometric and more complex to recover. In contrast, cases (i) and (iii) are characterized by a better IWO score compared to (ii) and (iv). Indeed, in configurations (i) and (iii), there are dimensions within each generative factor's subspace that are orthogonal to one another. Consequently, any variation along any such dimension will leave the other factor unchanged.
  • Figure 5: Two synthetic experimental settings with $L=10$, $K=5$ and differing ranks $R$. Left: $R=2$, each ${z}_j$ is a function of two successive elements of ${\bm{c}}$: ${z}_1 = f({c}_1, {c}_2)$, $\dots$, ${z}_5 = f({c}_9, {c}_{10})$. Right: $R=5$, each ${z}_j$ is a function of five successive elements of ${\bm{c}}$: ${z}_1 = f({c}_9, {c}_{10}, {c}_1, {c}_2 ,{c}_3)$ , $\dots$, ${z}_5 = f({c}_7, {c}_{8}, {c}_9, {c}_{10}, {c}_1)$.
  • ...and 2 more figures

Theorems & Definitions (8)

  • Theorem B.1
  • proof
  • Theorem B.2
  • proof
  • Theorem B.3
  • proof
  • Theorem B.4
  • proof