Table of Contents
Fetching ...

Data Augmentation of Contrastive Learning is Estimating Positive-incentive Noise

Hongyuan Zhang, Yanchen Xu, Sida Huang, Xuelong Li

TL;DR

This work reframes data augmentation in contrastive learning as learning beneficial noise, introducing Positive-incentive Noise (Pi-noise) and its Gaussian surrogate to connect augmentation to task mutual information. It proves that standard augmentations are effectively point estimates of Pi-noise and develops PiNDA, a learnable Pi-noise generator that produces augmentation views without assuming data type, while remaining compatible with existing contrastive models. Theoretical developments link task entropy to the contrastive loss via an auxiliary variable, enabling a practical optimization that reduces to maximizing a contrastive objective while learning the augmentation distribution. Experimental results on non-vision and vision datasets show PiNDA improves classification or retrieval metrics, with visualizations illustrating that the learned augmentations resemble meaningful style or background changes and converge rapidly. Overall, PiNDA offers a general, unsupervised approach to data augmentation that can extend contrastive learning to diverse domains and improve augmentation stability and effectiveness.

Abstract

Inspired by the idea of Positive-incentive Noise (Pi-Noise or $π$-Noise) that aims at learning the reliable noise beneficial to tasks, we scientifically investigate the connection between contrastive learning and $π$-noise in this paper. By converting the contrastive loss to an auxiliary Gaussian distribution to quantitatively measure the difficulty of the specific contrastive model under the information theory framework, we properly define the task entropy, the core concept of $π$-noise, of contrastive learning. It is further proved that the predefined data augmentation in the standard contrastive learning paradigm can be regarded as a kind of point estimation of $π$-noise. Inspired by the theoretical study, a framework that develops a $π$-noise generator to learn the beneficial noise (instead of estimation) as data augmentations for contrast is proposed. The designed framework can be applied to diverse types of data and is also completely compatible with the existing contrastive models. From the visualization, we surprisingly find that the proposed method successfully learns effective augmentations.

Data Augmentation of Contrastive Learning is Estimating Positive-incentive Noise

TL;DR

This work reframes data augmentation in contrastive learning as learning beneficial noise, introducing Positive-incentive Noise (Pi-noise) and its Gaussian surrogate to connect augmentation to task mutual information. It proves that standard augmentations are effectively point estimates of Pi-noise and develops PiNDA, a learnable Pi-noise generator that produces augmentation views without assuming data type, while remaining compatible with existing contrastive models. Theoretical developments link task entropy to the contrastive loss via an auxiliary variable, enabling a practical optimization that reduces to maximizing a contrastive objective while learning the augmentation distribution. Experimental results on non-vision and vision datasets show PiNDA improves classification or retrieval metrics, with visualizations illustrating that the learned augmentations resemble meaningful style or background changes and converge rapidly. Overall, PiNDA offers a general, unsupervised approach to data augmentation that can extend contrastive learning to diverse domains and improve augmentation stability and effectiveness.

Abstract

Inspired by the idea of Positive-incentive Noise (Pi-Noise or -Noise) that aims at learning the reliable noise beneficial to tasks, we scientifically investigate the connection between contrastive learning and -noise in this paper. By converting the contrastive loss to an auxiliary Gaussian distribution to quantitatively measure the difficulty of the specific contrastive model under the information theory framework, we properly define the task entropy, the core concept of -noise, of contrastive learning. It is further proved that the predefined data augmentation in the standard contrastive learning paradigm can be regarded as a kind of point estimation of -noise. Inspired by the theoretical study, a framework that develops a -noise generator to learn the beneficial noise (instead of estimation) as data augmentations for contrast is proposed. The designed framework can be applied to diverse types of data and is also completely compatible with the existing contrastive models. From the visualization, we surprisingly find that the proposed method successfully learns effective augmentations.
Paper Structure (24 sections, 26 equations, 7 figures, 5 tables, 2 algorithms)

This paper contains 24 sections, 26 equations, 7 figures, 5 tables, 2 algorithms.

Figures (7)

  • Figure 1: Visualization of the $\pi$-noise learned by SimCLRSimCLR with PiNDA on STL-10. We aim to learn the $\pi$-noise with a Gaussian distribution $\mathcal{N}(\bm \mu, \bm \Sigma)$. We fix $\bm \mu$ as 0 and only learn the variance of Gaussian $\pi$-noise, which is visualized in the second row. The third row is a sampling noise and the fourth row contains the images added with $\pi$-noise. Compared with other visualizations, the noise is first normalized and then fed into the contrastive module. From the visualization, we find that the $\pi$-noise generator successfully learns effective visual augmentations (like style transfer) with only original images.
  • Figure 2: Framework of PiNDA: For simplicity, we assume that there is no other practicable augmentation. So the original sample and $\pi$-noise augmentation are used for contrast. If more data augmentations are available, the $\pi$-noise can be viewed as one of the augmentations and it will generate and backpropagate gradients to $\pi$-noise generator when the $\pi$-noise augmentation is sampled.
  • Figure 3: Illustration of the auxiliary Gaussian distribution. The smaller contrastive loss (i.e., larger $\gamma_{\bm \theta^*}(\bm x, \bm \varepsilon)$) is converted to a Gaussian distribution with smaller variance (i.e., smaller entropy) and vice versa.
  • Figure 4: Visualization of different noise settings on SimCLR: In the second and third rows, we visualize the learnable $\bm \mu$ and $\bm \Sigma$ of Gaussian noise respectively. The fourth row shows the learned $\pi$-noise from the uniform distribution. The last row is the learned Gaussian noise with $\bm \mu=0$.
  • Figure 5: More visualization of the $\pi$-noise by PiNDA trained with SimCLR on STL-10. We fix $\bm \mu$ as 0 and only learn the variance of Gaussian $\pi$-noise, which is visualized in the second row. The third row is a sampling result from the learned $\pi$-noise distribution. Compared with Figure \ref{['figure_visualization']}, the noise is not normalized.
  • ...and 2 more figures