Table of Contents
Fetching ...

SINCERE: Supervised Information Noise-Contrastive Estimation REvisited

Patrick Feeney, Michael C. Hughes

TL;DR

This work proposes the Supervised InfoNCE REvisited (SINCERE) loss as a theoretically-justified supervised extension of InfoNCE that eliminates intra-class repulsion and shows that SINCERE leads to better separation of embeddings from different classes and improves transfer learning classification accuracy.

Abstract

The information noise-contrastive estimation (InfoNCE) loss function provides the basis of many self-supervised deep learning methods due to its strong empirical results and theoretic motivation. Previous work suggests a supervised contrastive (SupCon) loss to extend InfoNCE to learn from available class labels. This SupCon loss has been widely-used due to reports of good empirical performance. However, in this work we find that the prior SupCon loss formulation has questionable justification because it can encourage some images from the same class to repel one another in the learned embedding space. This problematic intra-class repulsion gets worse as the number of images sharing one class label increases. We propose the Supervised InfoNCE REvisited (SINCERE) loss as a theoretically-justified supervised extension of InfoNCE that eliminates intra-class repulsion. Experiments show that SINCERE leads to better separation of embeddings from different classes and improves transfer learning classification accuracy. We additionally utilize probabilistic modeling to derive an information-theoretic bound that relates SINCERE loss to the symmeterized KL divergence between data-generating distributions for a target class and all other classes.

SINCERE: Supervised Information Noise-Contrastive Estimation REvisited

TL;DR

This work proposes the Supervised InfoNCE REvisited (SINCERE) loss as a theoretically-justified supervised extension of InfoNCE that eliminates intra-class repulsion and shows that SINCERE leads to better separation of embeddings from different classes and improves transfer learning classification accuracy.

Abstract

The information noise-contrastive estimation (InfoNCE) loss function provides the basis of many self-supervised deep learning methods due to its strong empirical results and theoretic motivation. Previous work suggests a supervised contrastive (SupCon) loss to extend InfoNCE to learn from available class labels. This SupCon loss has been widely-used due to reports of good empirical performance. However, in this work we find that the prior SupCon loss formulation has questionable justification because it can encourage some images from the same class to repel one another in the learned embedding space. This problematic intra-class repulsion gets worse as the number of images sharing one class label increases. We propose the Supervised InfoNCE REvisited (SINCERE) loss as a theoretically-justified supervised extension of InfoNCE that eliminates intra-class repulsion. Experiments show that SINCERE leads to better separation of embeddings from different classes and improves transfer learning classification accuracy. We additionally utilize probabilistic modeling to derive an information-theoretic bound that relates SINCERE loss to the symmeterized KL divergence between data-generating distributions for a target class and all other classes.
Paper Structure (41 sections, 4 theorems, 36 equations, 6 figures, 6 tables)

This paper contains 41 sections, 4 theorems, 36 equations, 6 figures, 6 tables.

Key Result

Proposition 3

Assume $\mathcal{X}$ is generated via the model in Def. def:selfsup_model and that the target and noise PDFs are known. The probability that index $S$ is the sole draw from the target distribution is

Figures (6)

  • Figure 1: Visualization of supervised contrastive learning objectives for pulling together embeddings from the target class, indexed by elements of $\mathcal{T}$, and pushing away embeddings from the noise classes, indexed by elements of $\mathcal{N}$. Both objectives are defined with respect to a pair of target embeddings $z_S$ and $z_p$. Solid arrows show common effects of both methods: $z_p$ is pulled towards $z_S$ and pushed away from embeddings $z_n$ from the noise classes. Dashed arrows show SupCon's problematic intra-class repulsion: $z_p$ is pushed away from $z_a$ and $z_b$ as if they were from a noise class, despite belonging to the target class.
  • Figure 2: Histograms of cosine similarity values for CIFAR-10 test set nearest neighbors, comparing SupCon (left) and SINCERE (right). We plot the similarity of each test image to the nearest target image in the training set as well as the nearest noise image in the training set. The vertical dotted lines visualize the median similarity value. Our SINCERE loss reduces similarity to the nearest noise image by a substantial amount, thereby improving target-noise separation.
  • Figure 3: Histograms of cosine similarity values for CIFAR-10 test set nearest neighbors, comparing SupCon (left) and SINCERE (right). For each class, we plot the similarity of each test image with that class to the nearest target image in the training set as well as the nearest noise image in the training set. SINCERE loss maintains high similarity for the target distribution while lowering the cosine similarity of the noise distribution more than SupCon loss.
  • Figure :
  • Figure :
  • ...and 1 more figures

Theorems & Definitions (8)

  • Definition 2
  • Proposition 3
  • Definition 5
  • Proposition 6
  • Definition 7
  • Definition 8
  • Theorem 10
  • Theorem 11