Table of Contents
Fetching ...

Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning

Ruiqian Nai, Zixin Wen, Ji Li, Yuanzhi Li, Yang Gao

TL;DR

It is shown that dimension-wise disentangled representations are unnecessary on a fundamental downstream task, abstract visual reasoning, and suggested that the informativeness of representations is a better indicator of downstream performance than disentanglement.

Abstract

In representation learning, a disentangled representation is highly desirable as it encodes generative factors of data in a separable and compact pattern. Researchers have advocated leveraging disentangled representations to complete downstream tasks with encouraging empirical evidence. This paper further investigates the necessity of disentangled representation in downstream applications. Specifically, we show that dimension-wise disentangled representations are unnecessary on a fundamental downstream task, abstract visual reasoning. We provide extensive empirical evidence against the necessity of disentanglement, covering multiple datasets, representation learning methods, and downstream network architectures. Furthermore, our findings suggest that the informativeness of representations is a better indicator of downstream performance than disentanglement. Finally, the positive correlation between informativeness and disentanglement explains the claimed usefulness of disentangled representations in previous works. The source code is available at https://github.com/Richard-coder-Nai/disentanglement-lib-necessity.git.

Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning

TL;DR

It is shown that dimension-wise disentangled representations are unnecessary on a fundamental downstream task, abstract visual reasoning, and suggested that the informativeness of representations is a better indicator of downstream performance than disentanglement.

Abstract

In representation learning, a disentangled representation is highly desirable as it encodes generative factors of data in a separable and compact pattern. Researchers have advocated leveraging disentangled representations to complete downstream tasks with encouraging empirical evidence. This paper further investigates the necessity of disentangled representation in downstream applications. Specifically, we show that dimension-wise disentangled representations are unnecessary on a fundamental downstream task, abstract visual reasoning. We provide extensive empirical evidence against the necessity of disentanglement, covering multiple datasets, representation learning methods, and downstream network architectures. Furthermore, our findings suggest that the informativeness of representations is a better indicator of downstream performance than disentanglement. Finally, the positive correlation between informativeness and disentanglement explains the claimed usefulness of disentangled representations in previous works. The source code is available at https://github.com/Richard-coder-Nai/disentanglement-lib-necessity.git.
Paper Structure (26 sections, 4 equations, 27 figures, 5 tables)

This paper contains 26 sections, 4 equations, 27 figures, 5 tables.

Figures (27)

  • Figure 1: An example of RPM on 3DShapes from van2019disentangled.
  • Figure 2: Average test accuracy on 3DShapes throughout the training. The shaded area indicates the maximum and minimum values. We select the Stage-1 models with best $\overline{\text{WReN}}$ or $\overline{\text{Trans.}}$ among 3600 checkpoints on 3DShapes. Stage-1 models with disentanglement inductive bias (DisVAEs) are not necessarily better than those without such bias (BYOL) regarding sample efficiency and final accuracy.
  • Figure 3: Rank correlations between $\overline{\text{WReN}}$ or $\overline{\text{Trans.}}$ and representation metrics on 3DShapes. We denote the step with the highest validation accuracy as "Best". The brighter the panel, the more correlated the representation metric is with the downstream performance.
  • Figure 4: Representation metrics versus $\overline{\text{WReN}}$ at step 10000, where Stage-1 models are BYOL, and the dataset is 3DShapes. We can observe a strong positive correlation between the informativeness metric scores and downstream accuracy.
  • Figure 5: (a) Correlations between metrics and (b) correlations between adjusted metrics and downstream accuracy when using DisVAEs-WReN pipeline on 3DShapes. Disentanglement metrics exhibit positive correlations with informativeness. Yet when conditioned on close informativeness, their adjusted versions show mild correlations.
  • ...and 22 more figures