Table of Contents
Fetching ...

Maximizing Incremental Information Entropy for Contrastive Learning

Jiansong Zhang, Zhuoqin Yang, Xu Wu, Xiaoling Luo, Peizhong Liu, Linlin Shen

Abstract

Contrastive learning has achieved remarkable success in self-supervised representation learning, often guided by information-theoretic objectives such as mutual information maximization. Motivated by the limitations of static augmentations and rigid invariance constraints, we propose IE-CL (Incremental-Entropy Contrastive Learning), a framework that explicitly optimizes the entropy gain between augmented views while preserving semantic consistency. Our theoretical framework reframes the challenge by identifying the encoder as an information bottleneck and proposes a joint optimization of two components: a learnable transformation for entropy generation and an encoder regularizer for its preservation. Experiments on CIFAR-10/100, STL-10, and ImageNet demonstrate that IE-CL consistently improves performance under small-batch settings. Moreover, our core modules can be seamlessly integrated into existing frameworks. This work bridges theoretical principles and practice, offering a new perspective in contrastive learning.

Maximizing Incremental Information Entropy for Contrastive Learning

Abstract

Contrastive learning has achieved remarkable success in self-supervised representation learning, often guided by information-theoretic objectives such as mutual information maximization. Motivated by the limitations of static augmentations and rigid invariance constraints, we propose IE-CL (Incremental-Entropy Contrastive Learning), a framework that explicitly optimizes the entropy gain between augmented views while preserving semantic consistency. Our theoretical framework reframes the challenge by identifying the encoder as an information bottleneck and proposes a joint optimization of two components: a learnable transformation for entropy generation and an encoder regularizer for its preservation. Experiments on CIFAR-10/100, STL-10, and ImageNet demonstrate that IE-CL consistently improves performance under small-batch settings. Moreover, our core modules can be seamlessly integrated into existing frameworks. This work bridges theoretical principles and practice, offering a new perspective in contrastive learning.
Paper Structure (39 sections, 2 theorems, 29 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 39 sections, 2 theorems, 29 equations, 8 figures, 7 tables, 1 algorithm.

Key Result

Lemma 3.1

Let $Z = f_\theta(X)$ be the embedding of input $X$ and $Z^+$ the corresponding positive sample. Then, based on the Donsker--Varadhan representation, the mutual information satisfies

Figures (8)

  • Figure 1: Overview of the proposed IE-CL. We define incremental entropy as the absolute change in entropy induced by classical contrastive augmentations (see Definition \ref{['def3.2']}). To optimize the contrastive learning process, we propose the Sample Augmentation Incremental Block (SAIB), a learnable module that ensures the local Jacobian determinant > 1. By incorporating sample-level incremental entropy into contrastive optimization, we establish a principled framework that improves the effectiveness of self-supervised representation learning.
  • Figure 2: Illustration of the data augmentation operators studied. The non-isometric transformation operator SAIB has learnable parameters, enabling non-prior augmentation for contrastive learning. Visualizing changes from 100 epochs (d) to 400 epochs (h) shows that KL divergence effectively constrains incremental entropy, preventing collapse.
  • Figure 3: Ablation tests the relationship between SAIB and the previous pretext task. The image was resized to 224$\times$224, and augmentation strength settings from pmlr-v119-chen20j were applied, followed by two-by-two tests with SAIB placed on both sides of the contrastive learning.
  • Figure 4: Comparison of SSL training loss drop curves based on the proposed maximized incremental information entropy (SAIB) on ImageNet-1K, using MoCo-v2 as the baseline.
  • Figure 5: The variation of the incremental entropy $\Delta H(X)$ on the Query side and InfoNCE throughout the iterations is shown.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Lemma 3.1: Equivalence between InfoNCE minimization and mutual information maximization
  • proof
  • Definition 3.2: Based on the concept of Shannon Entropy, the change in information entropy of a given sample $X$ after a transformation $g$ is applied, resulting in $X'$, is referred to as the Incremental Information Entropy
  • proof
  • Proposition 3.3: Principle of Constrained Incremental Entropy Maximization
  • proof : Theoretical Argument