Table of Contents
Fetching ...

Tuned Contrastive Learning

Chaitanya Animesh, Manmohan Chandraker

TL;DR

A novel contrastive loss function - Tuned Contrastive Learning (TCL) loss is proposed that generalizes to multiple positives and negatives in a batch and offers parameters to tune and improve the gradient responses from hard positives and hard negatives.

Abstract

In recent times, contrastive learning based loss functions have become increasingly popular for visual self-supervised representation learning owing to their state-of-the-art (SOTA) performance. Most of the modern contrastive learning methods generalize only to one positive and multiple negatives per anchor. A recent state-of-the-art, supervised contrastive (SupCon) loss, extends self-supervised contrastive learning to supervised setting by generalizing to multiple positives and negatives in a batch and improves upon the cross-entropy loss. In this paper, we propose a novel contrastive loss function -- Tuned Contrastive Learning (TCL) loss, that generalizes to multiple positives and negatives in a batch and offers parameters to tune and improve the gradient responses from hard positives and hard negatives. We provide theoretical analysis of our loss function's gradient response and show mathematically how it is better than that of SupCon loss. We empirically compare our loss function with SupCon loss and cross-entropy loss in supervised setting on multiple classification-task datasets to show its effectiveness. We also show the stability of our loss function to a range of hyper-parameter settings. Unlike SupCon loss which is only applied to supervised setting, we show how to extend TCL to self-supervised setting and empirically compare it with various SOTA self-supervised learning methods. Hence, we show that TCL loss achieves performance on par with SOTA methods in both supervised and self-supervised settings.

Tuned Contrastive Learning

TL;DR

A novel contrastive loss function - Tuned Contrastive Learning (TCL) loss is proposed that generalizes to multiple positives and negatives in a batch and offers parameters to tune and improve the gradient responses from hard positives and hard negatives.

Abstract

In recent times, contrastive learning based loss functions have become increasingly popular for visual self-supervised representation learning owing to their state-of-the-art (SOTA) performance. Most of the modern contrastive learning methods generalize only to one positive and multiple negatives per anchor. A recent state-of-the-art, supervised contrastive (SupCon) loss, extends self-supervised contrastive learning to supervised setting by generalizing to multiple positives and negatives in a batch and improves upon the cross-entropy loss. In this paper, we propose a novel contrastive loss function -- Tuned Contrastive Learning (TCL) loss, that generalizes to multiple positives and negatives in a batch and offers parameters to tune and improve the gradient responses from hard positives and hard negatives. We provide theoretical analysis of our loss function's gradient response and show mathematically how it is better than that of SupCon loss. We empirically compare our loss function with SupCon loss and cross-entropy loss in supervised setting on multiple classification-task datasets to show its effectiveness. We also show the stability of our loss function to a range of hyper-parameter settings. Unlike SupCon loss which is only applied to supervised setting, we show how to extend TCL to self-supervised setting and empirically compare it with various SOTA self-supervised learning methods. Hence, we show that TCL loss achieves performance on par with SOTA methods in both supervised and self-supervised settings.
Paper Structure (37 sections, 7 theorems, 36 equations, 3 figures, 3 tables)

This paper contains 37 sections, 7 theorems, 36 equations, 3 figures, 3 tables.

Key Result

Lemma 1

The gradient of the SupCon loss per sample — $L_{i}^{sup}$ with respect to the normalized projection network embedding $z_{i}$ is given by: where

Figures (3)

  • Figure 1: Figure illustrates intuitively how TCL loss differs from SupCon loss Supcon. For the SupCon loss per sample — $L_{i}^{sup}$ (from equation \ref{['L_i_sup']}) to decrease, the anchor $z_{i}$ will pull the positive $z_{p}$ but push away the other positives by some extent in the embedding space. TCL loss introduces parameters to reduce this effect and helps improve performance.
  • Figure 2: SupCon vs TCL losses on a range of hyper-parameters. (a). batch size (top left) (b). encoder architecture (top right) (c). projector output dimensions/size (bottom left) (d). augmentation method (bottom right)
  • Figure 3: Analysis of $k_{1}$ and $k_{2}$ (a). plot of mean gradient from positives for SupCon and TCL (at various values of $k_{1}$) (top left) (b). top-1 accuracy vs $k_{1}$ on CIFAR-100 (top right) (c). plot of mean gradient from negatives for SupCon and TCL ($k_{1}=50000$ and $k_{2}=1$) (bottom left) (d). plot of mean gradient from negatives for SupCon and TCL ($k_{1}=50000$ and $k_{2}=3$) (bottom right)

Theorems & Definitions (10)

  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Theorem 2
  • Lemma 2
  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • proof