Table of Contents
Fetching ...

SynCo: Synthetic Hard Negatives for Contrastive Visual Representation Learning

Nikolaos Giakoumoglou, Tania Stathaki

TL;DR

The paper addresses the challenge of efficiently leveraging hard negatives in contrastive visual representation learning. It proposes SynCo, which generates six synthetic hard negatives on-the-fly from a memory queue and integrates them into the InfoNCE loss within the MoCo framework. Empirical results show improvements in ImageNet linear evaluation, semi-supervised learning, and transfer to detection on PASCAL VOC and COCO, with faster convergence and stronger representations. The work suggests SynCo's approach could generalize to other domains and contrastive learning setups, enabling broader impact with minimal additional computation.

Abstract

Contrastive learning has become a dominant approach in self-supervised visual representation learning, but efficiently leveraging hard negatives, which are samples closely resembling the anchor, remains challenging. We introduce SynCo (Synthetic negatives in Contrastive learning), a novel approach that improves model performance by generating synthetic hard negatives on the representation space. Building on the MoCo framework, SynCo introduces six strategies for creating diverse synthetic hard negatives on-the-fly with minimal computational overhead. SynCo achieves faster training and strong representation learning, surpassing MoCo-v2 by +0.4% and MoCHI by +1.0% on ImageNet ILSVRC-2012 linear evaluation. It also transfers more effectively to detection tasks achieving strong results on PASCAL VOC detection (57.2% AP) and significantly improving over MoCo-v2 on COCO detection (+1.0% AP) and instance segmentation (+0.8% AP). Our synthetic hard negative generation approach significantly enhances visual representations learned through self-supervised contrastive learning.

SynCo: Synthetic Hard Negatives for Contrastive Visual Representation Learning

TL;DR

The paper addresses the challenge of efficiently leveraging hard negatives in contrastive visual representation learning. It proposes SynCo, which generates six synthetic hard negatives on-the-fly from a memory queue and integrates them into the InfoNCE loss within the MoCo framework. Empirical results show improvements in ImageNet linear evaluation, semi-supervised learning, and transfer to detection on PASCAL VOC and COCO, with faster convergence and stronger representations. The work suggests SynCo's approach could generalize to other domains and contrastive learning setups, enabling broader impact with minimal additional computation.

Abstract

Contrastive learning has become a dominant approach in self-supervised visual representation learning, but efficiently leveraging hard negatives, which are samples closely resembling the anchor, remains challenging. We introduce SynCo (Synthetic negatives in Contrastive learning), a novel approach that improves model performance by generating synthetic hard negatives on the representation space. Building on the MoCo framework, SynCo introduces six strategies for creating diverse synthetic hard negatives on-the-fly with minimal computational overhead. SynCo achieves faster training and strong representation learning, surpassing MoCo-v2 by +0.4% and MoCHI by +1.0% on ImageNet ILSVRC-2012 linear evaluation. It also transfers more effectively to detection tasks achieving strong results on PASCAL VOC detection (57.2% AP) and significantly improving over MoCo-v2 on COCO detection (+1.0% AP) and instance segmentation (+0.8% AP). Our synthetic hard negative generation approach significantly enhances visual representations learned through self-supervised contrastive learning.
Paper Structure (28 sections, 11 equations, 5 figures, 4 tables)

This paper contains 28 sections, 11 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: SynCo extends MoCo he2020mocochen2020mocov2 by introducing synthetic hard negatives generated on-the-fly from a memory queue. The process begins with two augmented views of an image, $\mathbf{x}_q$ and $\mathbf{x}_k$, processed by an encoder and a momentum encoder, respectively, producing feature vectors $\mathbf{q}$ and $\mathbf{k}$. The memory queue holds negative samples $\mathbf{n}_1, \mathbf{n}_2, \ldots$, which are concatenated with synthetic hard negatives $\mathbf{s}_1, \mathbf{s}_2, \ldots$ generated using the SynCo strategies. These combined negatives are used to compute the affinity matrix, which, together with the positive pair (query $\mathbf{q}$ and key $\mathbf{k}$), contributes to the InfoNCE loss calculation.
  • Figure 2: Histogram of the top 1024 matching probabilities $p_{z_i}$, $z_i \in \mathcal{Q}$ for MoCo-v2, over various epochs. Logits are organized in descending order, and each line indicates the mean matching probability across all queries kalantidis2020mochi.
  • Figure 3: Performance comparison of MoCo, MoCo-v2, MoCHI, and SynCo (under various configurations) on ImageNet-100 in terms of accuracy on the proxy task (percentage of queries where the key is ranked higher than all negatives).
  • Figure 4: Performance comparison of MoCo-v2, MoCHI, and SynCo (under various configurations) on ImageNet-100 in terms of alignment and uniformity metrics. The x-axis and y-axis represent $-\mathcal{L}_{\text{uniform}}$ and $-\mathcal{L}_{\text{align}}$, respectively. The model with the highest performance is located in the upper-right corner of the chart.
  • Figure 5: Distribution of the ratio between inter-class and intra-class distances for MoCo-based methods. Higher values indicate better class separation. For clarity, we only show MoCo-v2 chen2020mocov2 (800 epochs), PCL-v2 li2021pcl (200 epochs), and SynCo (800 epochs).