Table of Contents
Fetching ...

Towards an Improved Metric for Evaluating Disentangled Representations

Sahib Julka, Yashu Wang, Michael Granitzer

TL;DR

A new framework for quantifying disentanglement is proposed, introducing a metric entitled EDI that leverages the intuitive concept of exclusivity and improved factor-code relationship to minimize ad-hoc decisions and is advocated for its adoption as a standardised approach.

Abstract

Disentangled representation learning plays a pivotal role in making representations controllable, interpretable and transferable. Despite its significance in the domain, the quest for reliable and consistent quantitative disentanglement metric remains a major challenge. This stems from the utilisation of diverse metrics measuring different properties and the potential bias introduced by their design. Our work undertakes a comprehensive examination of existing popular disentanglement evaluation metrics, comparing them in terms of measuring aspects of disentanglement (viz. Modularity, Compactness, and Explicitness), detecting the factor-code relationship, and describing the degree of disentanglement. We propose a new framework for quantifying disentanglement, introducing a metric entitled \emph{EDI}, that leverages the intuitive concept of \emph{exclusivity} and improved factor-code relationship to minimize ad-hoc decisions. An in-depth analysis reveals that EDI measures essential properties while offering more stability than existing metrics, advocating for its adoption as a standardised approach.

Towards an Improved Metric for Evaluating Disentangled Representations

TL;DR

A new framework for quantifying disentanglement is proposed, introducing a metric entitled EDI that leverages the intuitive concept of exclusivity and improved factor-code relationship to minimize ad-hoc decisions and is advocated for its adoption as a standardised approach.

Abstract

Disentangled representation learning plays a pivotal role in making representations controllable, interpretable and transferable. Despite its significance in the domain, the quest for reliable and consistent quantitative disentanglement metric remains a major challenge. This stems from the utilisation of diverse metrics measuring different properties and the potential bias introduced by their design. Our work undertakes a comprehensive examination of existing popular disentanglement evaluation metrics, comparing them in terms of measuring aspects of disentanglement (viz. Modularity, Compactness, and Explicitness), detecting the factor-code relationship, and describing the degree of disentanglement. We propose a new framework for quantifying disentanglement, introducing a metric entitled \emph{EDI}, that leverages the intuitive concept of \emph{exclusivity} and improved factor-code relationship to minimize ad-hoc decisions. An in-depth analysis reveals that EDI measures essential properties while offering more stability than existing metrics, advocating for its adoption as a standardised approach.
Paper Structure (32 sections, 14 equations, 5 figures, 5 tables)

This paper contains 32 sections, 14 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: As $\alpha$ increases, the factor-code relationship becomes more non-linear. We see a decline in most metrics computing MI using binning, as well as metrics that use predictors. EDI, in comparison, exhibits good stability.
  • Figure 2: As $\alpha$ increases, the representation becomes less modular and compact. EDI and DCI perform adequately, whereas MIG, SAP assign $0$ with partial entanglement and Z-diff, Z-min Variance fail to observe any difference.
  • Figure 3: As $\alpha$ increases the representation becomes more noisy. We expect explicitness measuring metrics to gradually reach 0, but modularity and compactness metrics should stay unaffected. EDI exhibits greater stability here in comparison to others.
  • Figure 4: Comparing sample efficiency (left) and time complexity (right). A metric is more sample efficient if it shows a smaller difference in its score as sample size increases. Time complexity is assessed by examining the rate of change in computation duration as the sample size increases. Here we see a clear downside of using complex predictors to model factor-code relationships.
  • Figure 5: Metric correlations on Shapes3D using Spearman's rho.