ASAM: Boosting Segment Anything Model with Adversarial Tuning

Bo Li; Haoke Xiao; Lv Tang

ASAM: Boosting Segment Anything Model with Adversarial Tuning

Bo Li, Haoke Xiao, Lv Tang

TL;DR

This paper tackles the challenge of improving segmentation performance of the Segment Anything Model (SAM) on niche tasks without compromising its zero-shot generalization or adding substantial data or architectural changes. It introduces ASAM, a framework that generates natural, photorealistic adversarial samples by projecting images into a diffusion latent space (via DDIM inversion and CLIP-guided prompts), optimizing latent perturbations under a segmentation loss, and enforcing mask-aligned reconstruction with ControlNet. Fine-tuning SAM on these adversarial samples requires updating only a tiny fraction of parameters (≈0.001%), yielding an average improvement of about 1.3 mIoU across 14 unseen datasets and transferring to EfficientSAM as well. The work provides a data-efficient, cross-disciplinary approach—borrowing from NLP adversarial training ideas and leveraging diffusion-based generative modeling—to enhance large vision foundation models in real-world, diverse scenarios.

Abstract

In the evolving landscape of computer vision, foundation models have emerged as pivotal tools, exhibiting exceptional adaptability to a myriad of tasks. Among these, the Segment Anything Model (SAM) by Meta AI has distinguished itself in image segmentation. However, SAM, like its counterparts, encounters limitations in specific niche applications, prompting a quest for enhancement strategies that do not compromise its inherent capabilities. This paper introduces ASAM, a novel methodology that amplifies SAM's performance through adversarial tuning. We harness the potential of natural adversarial examples, inspired by their successful implementation in natural language processing. By utilizing a stable diffusion model, we augment a subset (1%) of the SA-1B dataset, generating adversarial instances that are more representative of natural variations rather than conventional imperceptible perturbations. Our approach maintains the photorealism of adversarial examples and ensures alignment with original mask annotations, thereby preserving the integrity of the segmentation task. The fine-tuned ASAM demonstrates significant improvements across a diverse range of segmentation tasks without necessitating additional data or architectural modifications. The results of our extensive evaluations confirm that ASAM establishes new benchmarks in segmentation tasks, thereby contributing to the advancement of foundational models in computer vision. Our project page is in https://asam2024.github.io/.

ASAM: Boosting Segment Anything Model with Adversarial Tuning

TL;DR

Abstract

Paper Structure (17 sections, 9 equations, 4 figures, 4 tables)

This paper contains 17 sections, 9 equations, 4 figures, 4 tables.

Introduction
Related Works
Segment Anything Model (SAM)
Adversarial Examples & Adversarial Training
Method
Overview
Adversarial Latent Optimization
Projecting Image to Diffusion Latent
Adversarial Optimization of Latent
Controllable Adversarial Samples Generation
Fine-tuning SAM with Adversarial Samples
Experiment
Experimental Setting
Quantitative and Qualitative Comparison
Ablation Studies
...and 2 more sections

Figures (4)

Figure 1: Performance comparison between ASAM and SAM on diverse segmentation datasets across different downstream tasks.
Figure 2: The architecture of our proposed ASAM framework. In the first step, we project the input image into the latent space and then optimize the latent space with adversarial technologies. In the second step, we use the optimized latent to generate adversarial samples controlled by masks. Finally, we fine-tune the SAM with the generated "natural" adversarial samples.
Figure 3: Qualitative comparison of the proposed ASAM and other methods. Yellow boxes represent the box prompts.
Figure 4: Adversarial examples comparison of ASAM and other attack methods.

ASAM: Boosting Segment Anything Model with Adversarial Tuning

TL;DR

Abstract

ASAM: Boosting Segment Anything Model with Adversarial Tuning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)