Table of Contents
Fetching ...

Understanding the Role of Equivariance in Self-supervised Learning

Yifei Wang, Kaiwen Hu, Sharut Gupta, Ziyu Ye, Yisen Wang, Stefanie Jegelka

TL;DR

An information-theoretic perspective is established to understand the generalization ability of E-SSL and identifies a critical explaining-away effect in E-SSL that creates a synergy between the equivariant and classification tasks.

Abstract

Contrastive learning has been a leading paradigm for self-supervised learning, but it is widely observed that it comes at the price of sacrificing useful features (\eg colors) by being invariant to data augmentations. Given this limitation, there has been a surge of interest in equivariant self-supervised learning (E-SSL) that learns features to be augmentation-aware. However, even for the simplest rotation prediction method, there is a lack of rigorous understanding of why, when, and how E-SSL learns useful features for downstream tasks. To bridge this gap between practice and theory, we establish an information-theoretic perspective to understand the generalization ability of E-SSL. In particular, we identify a critical explaining-away effect in E-SSL that creates a synergy between the equivariant and classification tasks. This synergy effect encourages models to extract class-relevant features to improve its equivariant prediction, which, in turn, benefits downstream tasks requiring semantic features. Based on this perspective, we theoretically analyze the influence of data transformations and reveal several principles for practical designs of E-SSL. Our theory not only aligns well with existing E-SSL methods but also sheds light on new directions by exploring the benefits of model equivariance. We believe that a theoretically grounded understanding on the role of equivariance would inspire more principled and advanced designs in this field. Code is available at https://github.com/kaotty/Understanding-ESSL.

Understanding the Role of Equivariance in Self-supervised Learning

TL;DR

An information-theoretic perspective is established to understand the generalization ability of E-SSL and identifies a critical explaining-away effect in E-SSL that creates a synergy between the equivariant and classification tasks.

Abstract

Contrastive learning has been a leading paradigm for self-supervised learning, but it is widely observed that it comes at the price of sacrificing useful features (\eg colors) by being invariant to data augmentations. Given this limitation, there has been a surge of interest in equivariant self-supervised learning (E-SSL) that learns features to be augmentation-aware. However, even for the simplest rotation prediction method, there is a lack of rigorous understanding of why, when, and how E-SSL learns useful features for downstream tasks. To bridge this gap between practice and theory, we establish an information-theoretic perspective to understand the generalization ability of E-SSL. In particular, we identify a critical explaining-away effect in E-SSL that creates a synergy between the equivariant and classification tasks. This synergy effect encourages models to extract class-relevant features to improve its equivariant prediction, which, in turn, benefits downstream tasks requiring semantic features. Based on this perspective, we theoretically analyze the influence of data transformations and reveal several principles for practical designs of E-SSL. Our theory not only aligns well with existing E-SSL methods but also sheds light on new directions by exploring the benefits of model equivariance. We believe that a theoretically grounded understanding on the role of equivariance would inspire more principled and advanced designs in this field. Code is available at https://github.com/kaotty/Understanding-ESSL.

Paper Structure

This paper contains 27 sections, 10 theorems, 19 equations, 4 figures, 4 tables.

Key Result

Proposition 1

Assume that the original input $\bar{X}\in{\mathbb{R}}^{d}$ and the augmentation $A\in{\mathbb{R}}^{d'}$ are independent, and $X=[\bar{X}, A]\in{\mathbb{R}}^{d+d'}$ is obtained with direct concatenation (DC). Then, there exists a simple linear encoder that has perfect equivariance to $A$, but yields

Figures (4)

  • Figure 1: Comparison between different transformations for E-SSL on CIFAR-10 with ResNet-18. Note that different pretraining tasks may have different classes (e.g., $4$ for rotation and $2$ for horizontal flip). The baseline is a random initialized encoder with 34% test accuracy under linear probing.
  • Figure 2: The causal diagram of equivariant self-supervised learning. The observed variables are in grey. $C$: class; $S$: style; $\bar{A}$: intrinsic equivariance variable; $\bar{X}$: raw input; $A$: augmentation; $X$: augmented input; $Z$: representation.
  • Figure 3: A controlled experiment on the influence of class information on equivariant prediction. We include three methods: 1) equivariant prediction (baseline); 2) jointly minimizing equivariant and classification losses ("+cls"); 3) minimizing the equivariant loss while adversarially maximizing the classification loss ganin2015domainadversarial ("- cls"). We study rotation prediction for (a), (b), (c) and (d), horizontal flip for (e), and four-fold blur for (f).
  • Figure 4: The model of this experiment. $X$: raw input; $Z$: representation; $R$: rotation prediction; $C$: class prediction. For rotation prediction, unless specified, the gradient flowing from the classifier to the encoder is detached.

Theorems & Definitions (20)

  • Proposition 1: Useless equivariance
  • Lemma 1: Explaining-away in E-SSL
  • Theorem 1: Class features improve equivariant prediction
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • proof
  • proof
  • Lemma 2: Theorem 3.5 (rephrased) koller2009pgm
  • proof
  • ...and 10 more