Table of Contents
Fetching ...

CoViews: Adaptive Augmentation Using Cooperative Views for Enhanced Contrastive Learning

Nazim Bendib

TL;DR

This work tackles the challenge of data augmentation in contrastive learning by proposing an adaptive augmentation framework that evolves during training without supervision. It introduces two policy families, IndepViews and CoViews, where CoViews learns dependent, cooperative augmentation for the two views, and a Bounded InfoNCE reward to guide policy search via PPO over a recurrent policy network. Empirical results across multiple vision datasets show consistent improvements in linear evaluation, with CoViews outperforming IndepViews and baselines, while maintaining minimal computational overhead. The approach broadens the applicability of adaptive augmentation to non-differentiable transforms and underscores the benefits of view-dependent augmentation strategies for more discriminative representations.

Abstract

Data augmentation plays a critical role in generating high-quality positive and negative pairs necessary for effective contrastive learning. However, common practices involve using a single augmentation policy repeatedly to generate multiple views, potentially leading to inefficient training pairs due to a lack of cooperation between views. Furthermore, to find the optimal set of augmentations, many existing methods require extensive supervised evaluation, overlooking the evolving nature of the model that may require different augmentations throughout the training. Other approaches train differentiable augmentation generators, thus limiting the use of non-differentiable transformation functions from the literature. In this paper, we address these challenges by proposing a framework for learning efficient adaptive data augmentation policies for contrastive learning with minimal computational overhead. Our approach continuously generates new data augmentation policies during training and produces effective positives/negatives without any supervision. Within this framework, we present two methods: \ac{IndepViews}, which generates augmentation policies used across all views, and \ac{CoViews}, which generates dependent augmentation policies for each view. This enables us to learn dependencies between the transformations applied to each view and ensures that the augmentation strategies applied to different views complement each other, leading to more meaningful and discriminative representations. Through extensive experimentation on multiple datasets and contrastive learning frameworks, we demonstrate that our method consistently outperforms baseline solutions and that training with a view-dependent augmentation policy outperforms training with an independent policy shared across views, showcasing its effectiveness in enhancing contrastive learning performance.

CoViews: Adaptive Augmentation Using Cooperative Views for Enhanced Contrastive Learning

TL;DR

This work tackles the challenge of data augmentation in contrastive learning by proposing an adaptive augmentation framework that evolves during training without supervision. It introduces two policy families, IndepViews and CoViews, where CoViews learns dependent, cooperative augmentation for the two views, and a Bounded InfoNCE reward to guide policy search via PPO over a recurrent policy network. Empirical results across multiple vision datasets show consistent improvements in linear evaluation, with CoViews outperforming IndepViews and baselines, while maintaining minimal computational overhead. The approach broadens the applicability of adaptive augmentation to non-differentiable transforms and underscores the benefits of view-dependent augmentation strategies for more discriminative representations.

Abstract

Data augmentation plays a critical role in generating high-quality positive and negative pairs necessary for effective contrastive learning. However, common practices involve using a single augmentation policy repeatedly to generate multiple views, potentially leading to inefficient training pairs due to a lack of cooperation between views. Furthermore, to find the optimal set of augmentations, many existing methods require extensive supervised evaluation, overlooking the evolving nature of the model that may require different augmentations throughout the training. Other approaches train differentiable augmentation generators, thus limiting the use of non-differentiable transformation functions from the literature. In this paper, we address these challenges by proposing a framework for learning efficient adaptive data augmentation policies for contrastive learning with minimal computational overhead. Our approach continuously generates new data augmentation policies during training and produces effective positives/negatives without any supervision. Within this framework, we present two methods: \ac{IndepViews}, which generates augmentation policies used across all views, and \ac{CoViews}, which generates dependent augmentation policies for each view. This enables us to learn dependencies between the transformations applied to each view and ensures that the augmentation strategies applied to different views complement each other, leading to more meaningful and discriminative representations. Through extensive experimentation on multiple datasets and contrastive learning frameworks, we demonstrate that our method consistently outperforms baseline solutions and that training with a view-dependent augmentation policy outperforms training with an independent policy shared across views, showcasing its effectiveness in enhancing contrastive learning performance.
Paper Structure (36 sections, 6 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 36 sections, 6 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: The policy network begins by predicting $N_{\tau}=2$ operations and their corresponding magnitudes for the subpolicy of view 1, and then for the subpolicy of view 2. Each prediction (operation, magnitude) is added to an action history, which is then fed into the next time step as input. In CoViews, we pass the action history and context vector at the end of subpolicy 1 to the LSTM unit again (connection shown in red) to generate subpolicy 2. In the case of IndepViews, we don't pass the action history and context vector from subpolicy 1 to predict subpolicy 2 (the connection shown in red is removed); instead, we start with a new empty action history and a new context vector.
  • Figure 2: Comparison of Bounded InfoNCE rewards for varying tolerance $b$ values, while keeping the threshold constant at $th=1.3$. A near-zero tolerance aggressively penalizes subpolicies exceeding $th$, while a very large tolerance provides a constant reward equal to $th$ for surpassing subpolicies.
  • Figure 3: Comparison of the evolution of transformation probability in the learned adaptive augmentation policies between IndepViews and CoViews on CIFAR-10 dataset.
  • Figure 4: A comparison between the co-occurrence matrix of transformations in view 1 and view 2 of both IndepViews and CoViews. Each value in the matrix represents the frequency of the co-occurrence of corresponding transformations across the two views.
  • Figure 5: Linear evaluation over CIFAR-10, SVHN, and STL10 using a threshold value in the range [1.1, 1.3, 1.5, 1.7, 1.9]. All experiments use a tolerance value of 0.2.
  • ...and 1 more figures