Table of Contents
Fetching ...

ANCHOR: Integrating Adversarial Training with Hard-mined Supervised Contrastive Learning for Robust Representation Learning

Samarup Bhattacharya, Anubhab Bhattacharya, Abir Chakraborty

TL;DR

The paper tackles the vulnerability of vision models to adversarial perturbations by proposing ANCHOR, a framework that blends adversarial training with hard-positive mined supervised contrastive learning to shape robust, discriminative representations. It integrates a hardness-weighted supervised contrastive loss with adversarial cross-entropy, using a two-view training regime (augmented and adversarial) and a modified ResNet-18 backbone, with a projection head for training and a frozen backbone for finetuning. Empirical results on CIFAR-10 show ANCHOR achieving higher robust accuracies under PGD-20 and AutoAttack while preserving competitive clean accuracy, outperforming several baselines. The work suggests that robust generalization emerges from semantically aligned, hard-positive-aware embedding spaces, and demonstrates practical gains for robust image representation learning.

Abstract

Neural networks have changed the way machines interpret the world. At their core, they learn by following gradients, adjusting their parameters step by step until they identify the most discriminant patterns in the data. This process gives them their strength, yet it also opens the door to a hidden flaw. The very gradients that help a model learn can also be used to produce small, imperceptible tweaks that cause the model to completely alter its decision. Such tweaks are called adversarial attacks. These attacks exploit this vulnerability by adding tiny, imperceptible changes to images that, while leaving them identical to the human eye, cause the model to make wrong predictions. In this work, we propose Adversarially-trained Contrastive Hard-mining for Optimized Robustness (ANCHOR), a framework that leverages the power of supervised contrastive learning with explicit hard positive mining to enable the model to learn representations for images such that the embeddings for the images, their augmentations, and their perturbed versions cluster together in the embedding space along with those for other images of the same class while being separated from images of other classes. This alignment helps the model focus on stable, meaningful patterns rather than fragile gradient cues. On CIFAR-10, our approach achieves impressive results for both clean and robust accuracy under PGD-20 (epsilon = 0.031), outperforming standard adversarial training methods. Our results indicate that combining adversarial guidance with hard-mined contrastive supervision helps models learn more structured and robust representations, narrowing the gap between accuracy and robustness.

ANCHOR: Integrating Adversarial Training with Hard-mined Supervised Contrastive Learning for Robust Representation Learning

TL;DR

The paper tackles the vulnerability of vision models to adversarial perturbations by proposing ANCHOR, a framework that blends adversarial training with hard-positive mined supervised contrastive learning to shape robust, discriminative representations. It integrates a hardness-weighted supervised contrastive loss with adversarial cross-entropy, using a two-view training regime (augmented and adversarial) and a modified ResNet-18 backbone, with a projection head for training and a frozen backbone for finetuning. Empirical results on CIFAR-10 show ANCHOR achieving higher robust accuracies under PGD-20 and AutoAttack while preserving competitive clean accuracy, outperforming several baselines. The work suggests that robust generalization emerges from semantically aligned, hard-positive-aware embedding spaces, and demonstrates practical gains for robust image representation learning.

Abstract

Neural networks have changed the way machines interpret the world. At their core, they learn by following gradients, adjusting their parameters step by step until they identify the most discriminant patterns in the data. This process gives them their strength, yet it also opens the door to a hidden flaw. The very gradients that help a model learn can also be used to produce small, imperceptible tweaks that cause the model to completely alter its decision. Such tweaks are called adversarial attacks. These attacks exploit this vulnerability by adding tiny, imperceptible changes to images that, while leaving them identical to the human eye, cause the model to make wrong predictions. In this work, we propose Adversarially-trained Contrastive Hard-mining for Optimized Robustness (ANCHOR), a framework that leverages the power of supervised contrastive learning with explicit hard positive mining to enable the model to learn representations for images such that the embeddings for the images, their augmentations, and their perturbed versions cluster together in the embedding space along with those for other images of the same class while being separated from images of other classes. This alignment helps the model focus on stable, meaningful patterns rather than fragile gradient cues. On CIFAR-10, our approach achieves impressive results for both clean and robust accuracy under PGD-20 (epsilon = 0.031), outperforming standard adversarial training methods. Our results indicate that combining adversarial guidance with hard-mined contrastive supervision helps models learn more structured and robust representations, narrowing the gap between accuracy and robustness.

Paper Structure

This paper contains 14 sections, 7 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: Workflow of the proposed pipeline