Table of Contents
Fetching ...

DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation

Zelin Zang, Hao Luo, Kai Wang, Panpan Zhang, Fan Wang, Stan. Z Li, Yang You

TL;DR

DiffAug tackles the challenge of designing effective data augmentations for unsupervised contrastive learning without relying on domain expertise or labeled data. It introduces a semantic encoder and a conditional diffusion generator that are trained iteratively to produce semantically consistent positive samples, guided by a soft-contrastive learning objective and a diffusion loss. Across DNA sequences, bio-features, and vision datasets, DiffAug delivers consistent improvements over hand-crafted augmentations and state-of-the-art model-based methods, highlighting the method's versatility and effectiveness on limited data. The approach advances unsupervised representation learning by enabling diffusion-based augmentation that is domain-agnostic, scalable, and practically impactful for diverse data regimes.

Abstract

Unsupervised Contrastive learning has gained prominence in fields such as vision, and biology, leveraging predefined positive/negative samples for representation learning. Data augmentation, categorized into hand-designed and model-based methods, has been identified as a crucial component for enhancing contrastive learning. However, hand-designed methods require human expertise in domain-specific data while sometimes distorting the meaning of the data. In contrast, generative model-based approaches usually require supervised or large-scale external data, which has become a bottleneck constraining model training in many domains. To address the problems presented above, this paper proposes DiffAug, a novel unsupervised contrastive learning technique with diffusion mode-based positive data generation. DiffAug consists of a semantic encoder and a conditional diffusion model; the conditional diffusion model generates new positive samples conditioned on the semantic encoding to serve the training of unsupervised contrast learning. With the help of iterative training of the semantic encoder and diffusion model, DiffAug improves the representation ability in an uninterrupted and unsupervised manner. Experimental evaluations show that DiffAug outperforms hand-designed and SOTA model-based augmentation methods on DNA sequence, visual, and bio-feature datasets. The code for review is released at \url{https://github.com/zangzelin/code_diffaug}.

DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation

TL;DR

DiffAug tackles the challenge of designing effective data augmentations for unsupervised contrastive learning without relying on domain expertise or labeled data. It introduces a semantic encoder and a conditional diffusion generator that are trained iteratively to produce semantically consistent positive samples, guided by a soft-contrastive learning objective and a diffusion loss. Across DNA sequences, bio-features, and vision datasets, DiffAug delivers consistent improvements over hand-crafted augmentations and state-of-the-art model-based methods, highlighting the method's versatility and effectiveness on limited data. The approach advances unsupervised representation learning by enabling diffusion-based augmentation that is domain-agnostic, scalable, and practically impactful for diverse data regimes.

Abstract

Unsupervised Contrastive learning has gained prominence in fields such as vision, and biology, leveraging predefined positive/negative samples for representation learning. Data augmentation, categorized into hand-designed and model-based methods, has been identified as a crucial component for enhancing contrastive learning. However, hand-designed methods require human expertise in domain-specific data while sometimes distorting the meaning of the data. In contrast, generative model-based approaches usually require supervised or large-scale external data, which has become a bottleneck constraining model training in many domains. To address the problems presented above, this paper proposes DiffAug, a novel unsupervised contrastive learning technique with diffusion mode-based positive data generation. DiffAug consists of a semantic encoder and a conditional diffusion model; the conditional diffusion model generates new positive samples conditioned on the semantic encoding to serve the training of unsupervised contrast learning. With the help of iterative training of the semantic encoder and diffusion model, DiffAug improves the representation ability in an uninterrupted and unsupervised manner. Experimental evaluations show that DiffAug outperforms hand-designed and SOTA model-based augmentation methods on DNA sequence, visual, and bio-feature datasets. The code for review is released at \url{https://github.com/zangzelin/code_diffaug}.
Paper Structure (32 sections, 1 theorem, 26 equations, 7 figures, 11 tables, 1 algorithm)

This paper contains 32 sections, 1 theorem, 26 equations, 7 figures, 11 tables, 1 algorithm.

Key Result

Lemma 2.1

Let $\mathcal{L}_{\mathrm{cl}}=-\log {\mathcal{Q} \left(\mathbf{z}_{i} ,\mathbf{z}_{i}^{+}\right)} + \log\left[{\mathcal{Q} \left(\mathbf{z}_{i}, \mathbf{z}_{i}^{+}\right)+\sum_{\mathbf{z}_{i}^{-} \in V^{-}} \mathcal{Q} \left(\mathbf{z}_{i}, \mathbf{z}_{i}^{-}\right)}\right]$ and $\mathcal{L}_{\math

Figures (7)

  • Figure 1: Comparison of DiffAug with existing augmentation strategy. (a) Hand-designed augmentation is based on human priori that generate new data with different feature but semantically similar semantic. (b) Model-based augmentation methods generate new data with the same labels by training generative models with large amount of data, labels. These methods often require large amounts of data and target specific data domains. (c) DiffAug attempts to reduce the dependence on external data and prior knowledge through iterative training with encoders and diffusion. Expanding the application areas of unsupervised CL.
  • Figure 2: The DiffAug framework and training strategy. DiffAug includes a semantic encoder $\text{Enc}(\cdot|\theta)$ and a diffusion generator $\text{Gen}(\cdot|\phi)$. (a) shows how $\text{Enc}(\cdot|\theta)$ and $\text{Gen}(\cdot|\phi)$ are interative trained. (b) and (c) show how to calculate the loss functions. (d) shows how to generate new augmentation data with the trained model.
  • Figure 3: The display of original and generated images illustrates that DiffAug generates semantically similar augmented data. The 'Ori' means original data and Aug1, Aug2 and Aug3 are augmentated data. For bio-feature data, we use violin plots to plot the distribution of features.
  • Figure 4: The scatter visualization of representation indicates DiffAug's encoder learns cleaner embedding. The colors represent different categories; there are 100 categories in CF100; we used the superclasses label provided by deng2021flattening.
  • Figure 5: Hist plot of the cosine similarity between original data and the augmentation data in latent space indicate that DiffAug generates semantically smooth augmentations. For the image data, we compared similar mixups with random cropping. For bio-feature datasets, we compared same-label Mixup and random dimension swapping.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Lemma 2.1
  • proof