Table of Contents
Fetching ...

Representation Learning via Invariant Causal Mechanisms

Jovana Mitrovic, Brian McWilliams, Jacob Walker, Lars Buesing, Charles Blundell

TL;DR

The paper tackles the lack of theoretical grounding for self-supervised representation learning, introducing a causal framework that separates content from style and uses augmentations as style interventions. It proposes ReLIC, an objective that enforces invariant prediction of proxy targets across augmentations via an explicit regularizer, yielding stronger generalization guarantees. It further generalizes contrastive learning through the notion of refinements, showing that learning on refinements can suffice for downstream task generalization and offering an alternative to mutual information explanations. Empirically, ReLIC improves robustness and out-of-distribution generalization on ImageNet and achieves above-human performance on 51 of 57 Atari games, illustrating practical impact across vision and reinforcement learning domains.

Abstract

Self-supervised learning has emerged as a strategy to reduce the reliance on costly supervised signal by pretraining representations only using unlabeled data. These methods combine heuristic proxy classification tasks with data augmentations and have achieved significant success, but our theoretical understanding of this success remains limited. In this paper we analyze self-supervised representation learning using a causal framework. We show how data augmentations can be more effectively utilized through explicit invariance constraints on the proxy classifiers employed during pretraining. Based on this, we propose a novel self-supervised objective, Representation Learning via Invariant Causal Mechanisms (ReLIC), that enforces invariant prediction of proxy targets across augmentations through an invariance regularizer which yields improved generalization guarantees. Further, using causality we generalize contrastive learning, a particular kind of self-supervised method, and provide an alternative theoretical explanation for the success of these methods. Empirically, ReLIC significantly outperforms competing methods in terms of robustness and out-of-distribution generalization on ImageNet, while also significantly outperforming these methods on Atari achieving above human-level performance on $51$ out of $57$ games.

Representation Learning via Invariant Causal Mechanisms

TL;DR

The paper tackles the lack of theoretical grounding for self-supervised representation learning, introducing a causal framework that separates content from style and uses augmentations as style interventions. It proposes ReLIC, an objective that enforces invariant prediction of proxy targets across augmentations via an explicit regularizer, yielding stronger generalization guarantees. It further generalizes contrastive learning through the notion of refinements, showing that learning on refinements can suffice for downstream task generalization and offering an alternative to mutual information explanations. Empirically, ReLIC improves robustness and out-of-distribution generalization on ImageNet and achieves above-human performance on 51 of 57 Atari games, illustrating practical impact across vision and reinforcement learning domains.

Abstract

Self-supervised learning has emerged as a strategy to reduce the reliance on costly supervised signal by pretraining representations only using unlabeled data. These methods combine heuristic proxy classification tasks with data augmentations and have achieved significant success, but our theoretical understanding of this success remains limited. In this paper we analyze self-supervised representation learning using a causal framework. We show how data augmentations can be more effectively utilized through explicit invariance constraints on the proxy classifiers employed during pretraining. Based on this, we propose a novel self-supervised objective, Representation Learning via Invariant Causal Mechanisms (ReLIC), that enforces invariant prediction of proxy targets across augmentations through an invariance regularizer which yields improved generalization guarantees. Further, using causality we generalize contrastive learning, a particular kind of self-supervised method, and provide an alternative theoretical explanation for the success of these methods. Empirically, ReLIC significantly outperforms competing methods in terms of robustness and out-of-distribution generalization on ImageNet, while also significantly outperforming these methods on Atari achieving above human-level performance on out of games.

Paper Structure

This paper contains 40 sections, 7 theorems, 19 equations, 7 figures, 10 tables.

Key Result

Theorem 1

Let $\mathcal{Y}=\{Y_t\}_{t=1}^{T}$ be a family of downstream tasks. Let $Y^{R}$ be a refinement for all tasks in $\mathcal{Y}$. If $f(X)$ is an invariant representation for $Y^{R}$ under style interventions $S$, then $f(X)$ is an invariant representation for all tasks in $\mathcal{Y}$ under style i for all $s_{i}, s_{j}\in\mathcal{S}$ with $p^{do(s_{i})}=p^{do(S = s_{i})}$. Thus, $f(X)$ is a repr

Figures (7)

  • Figure 1: (a) Causal graph formalizing assumptions about content and style of the data and the relationship between targets and proxy tasks. (b)ReLIC objective. KL refers to the Kullback-Leibler divergence, while x-entropy denotes cross entropy.
  • Figure 2: Distribution of the linear discriminant ratio ($F_{\text{LDA}}$, see text) of $f$ for ReLIC, SimCLR and AMDIM ($y$-axis clipped to aid visualization).
  • Figure 3: Visual representation of invariance penalty. Shaded region denotes set of augmentations around an image.
  • Figure 4: Visualization of a refinement of a set of tasks. The tasks are to classify aquatic vs non-aquatic life and animal vs non-animal with the individual class boundaries denoted by the dashed and dotted black lines. A refinement for these tasks is a to classify aquatic animal vs aquatic non-animal vs non-aquatic animal vs non-aquatic non-animal and the class boundaries are given in teal. The ellipse indicates the set of points induced by augmenting the image of the ship.
  • Figure 5: The ImageNet-C dataset consists of 15 types of corruptions from noise, blur, weather, and digital categories. Each type of corruption has five levels of severity, resulting in 75 distinct corruptions. See different severity levels in Figure \ref{['severities']}.
  • ...and 2 more figures

Theorems & Definitions (11)

  • Theorem 1
  • Lemma 1: Concentration
  • Theorem 2: Generalization. Adapted from Lemma B.2. from saunshi2019theoretical
  • proof : Proof of Lemma \ref{['lem:concentration']}
  • Theorem 3: Connectedness erdHos1960evolution
  • Definition 1: Diameter
  • Theorem 4: Diameter of random graphs frieze2016introduction
  • Definition 2
  • Lemma 2
  • Definition 4
  • ...and 1 more