Temperature-Free Loss Function for Contrastive Learning

Bum Jun Kim; Sang Woo Kim

Temperature-Free Loss Function for Contrastive Learning

Bum Jun Kim, Sang Woo Kim

TL;DR

This paper tackles the sensitivity and hyperparameter burden of temperature scaling in InfoNCE-based contrastive learning. It introduces a temperature-free loss by mapping cosine similarities through the log-odds function, equivalently $2\,\operatorname{artanh}(\cos \theta)$, and feeding these into the softmax. The authors provide a theoretical analysis showing that temperature division can cause gradient issues, while the proposed log-odds mapping preserves alive gradients and zero gradients only at the optimum. Empirically, the method matches or surpasses temperature-based baselines across five benchmarks, including image classification, graph representation, anomaly detection, NLP, and sequential recommendation, with the added benefit of hyperparameter-free deployment.

Abstract

As one of the most promising methods in self-supervised learning, contrastive learning has achieved a series of breakthroughs across numerous fields. A predominant approach to implementing contrastive learning is applying InfoNCE loss: By capturing the similarities between pairs, InfoNCE loss enables learning the representation of data. Albeit its success, adopting InfoNCE loss requires tuning a temperature, which is a core hyperparameter for calibrating similarity scores. Despite its significance and sensitivity to performance being emphasized by several studies, searching for a valid temperature requires extensive trial-and-error-based experiments, which increases the difficulty of adopting InfoNCE loss. To address this difficulty, we propose a novel method to deploy InfoNCE loss without temperature. Specifically, we replace temperature scaling with the inverse hyperbolic tangent function, resulting in a modified InfoNCE loss. In addition to hyperparameter-free deployment, we observed that the proposed method even yielded a performance gain in contrastive learning. Our detailed theoretical analysis discovers that the current practice of temperature scaling in InfoNCE loss causes serious problems in gradient descent, whereas our method provides desirable gradient properties. The proposed method was validated on five benchmarks on contrastive learning, yielding satisfactory results without temperature tuning.

Temperature-Free Loss Function for Contrastive Learning

TL;DR

Abstract

Temperature-Free Loss Function for Contrastive Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)

Theorems & Definitions (5)