SINCERE: Supervised Information Noise-Contrastive Estimation REvisited

Patrick Feeney; Michael C. Hughes

SINCERE: Supervised Information Noise-Contrastive Estimation REvisited

Patrick Feeney, Michael C. Hughes

TL;DR

This work proposes the Supervised InfoNCE REvisited (SINCERE) loss as a theoretically-justified supervised extension of InfoNCE that eliminates intra-class repulsion and shows that SINCERE leads to better separation of embeddings from different classes and improves transfer learning classification accuracy.

Abstract

The information noise-contrastive estimation (InfoNCE) loss function provides the basis of many self-supervised deep learning methods due to its strong empirical results and theoretic motivation. Previous work suggests a supervised contrastive (SupCon) loss to extend InfoNCE to learn from available class labels. This SupCon loss has been widely-used due to reports of good empirical performance. However, in this work we find that the prior SupCon loss formulation has questionable justification because it can encourage some images from the same class to repel one another in the learned embedding space. This problematic intra-class repulsion gets worse as the number of images sharing one class label increases. We propose the Supervised InfoNCE REvisited (SINCERE) loss as a theoretically-justified supervised extension of InfoNCE that eliminates intra-class repulsion. Experiments show that SINCERE leads to better separation of embeddings from different classes and improves transfer learning classification accuracy. We additionally utilize probabilistic modeling to derive an information-theoretic bound that relates SINCERE loss to the symmeterized KL divergence between data-generating distributions for a target class and all other classes.

SINCERE: Supervised Information Noise-Contrastive Estimation REvisited

TL;DR

Abstract

Paper Structure (41 sections, 4 theorems, 36 equations, 6 figures, 6 tables)

This paper contains 41 sections, 4 theorems, 36 equations, 6 figures, 6 tables.

Introduction
Background
Noise-Contrastive Estimation
Self-Supervised Contrastive Learning
Supervised Contrastive Learning (SupCon)
Intra-Class Repulsion
Method
Derivation of SINCERE
Self-Supervised Probabilistic Model
Supervised Probabilistic Model
Ideal SINCERE Loss
Justification for SINCERE Loss
Lower Bound on SINCERE Loss
SINCERE Loss in Practice
Analysis of Gradients
...and 26 more sections

Key Result

Proposition 3

Assume $\mathcal{X}$ is generated via the model in Def. def:selfsup_model and that the target and noise PDFs are known. The probability that index $S$ is the sole draw from the target distribution is

Figures (6)

Figure 1: Visualization of supervised contrastive learning objectives for pulling together embeddings from the target class, indexed by elements of $\mathcal{T}$, and pushing away embeddings from the noise classes, indexed by elements of $\mathcal{N}$. Both objectives are defined with respect to a pair of target embeddings $z_S$ and $z_p$. Solid arrows show common effects of both methods: $z_p$ is pulled towards $z_S$ and pushed away from embeddings $z_n$ from the noise classes. Dashed arrows show SupCon's problematic intra-class repulsion: $z_p$ is pushed away from $z_a$ and $z_b$ as if they were from a noise class, despite belonging to the target class.
Figure 2: Histograms of cosine similarity values for CIFAR-10 test set nearest neighbors, comparing SupCon (left) and SINCERE (right). We plot the similarity of each test image to the nearest target image in the training set as well as the nearest noise image in the training set. The vertical dotted lines visualize the median similarity value. Our SINCERE loss reduces similarity to the nearest noise image by a substantial amount, thereby improving target-noise separation.
Figure 3: Histograms of cosine similarity values for CIFAR-10 test set nearest neighbors, comparing SupCon (left) and SINCERE (right). For each class, we plot the similarity of each test image with that class to the nearest target image in the training set as well as the nearest noise image in the training set. SINCERE loss maintains high similarity for the target distribution while lowering the cosine similarity of the noise distribution more than SupCon loss.
Figure :
Figure :
...and 1 more figures

Theorems & Definitions (8)

Definition 2
Proposition 3
Definition 5
Proposition 6
Definition 7
Definition 8
Theorem 10
Theorem 11

SINCERE: Supervised Information Noise-Contrastive Estimation REvisited

TL;DR

Abstract

SINCERE: Supervised Information Noise-Contrastive Estimation REvisited

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (8)