Zero-Shot Robustness of Vision Language Models Via Confidence-Aware Weighting

Nikoo Naghavian; Mostafa Tavassolipour

Zero-Shot Robustness of Vision Language Models Via Confidence-Aware Weighting

Nikoo Naghavian, Mostafa Tavassolipour

TL;DR

Problem: zero-shot robustness of vision-language models (e.g., CLIP) under imperceptible adversarial perturbations. Approach: CAW combines a Confidence-Aware loss, defined via $L_{CA}$ with $P^{\text{adv}}$ and $P^{\text{clean}}$ and KL divergence, and a Regularization term $L_{Reg}$ that aligns adversarial image features from the fine-tuned and frozen encoders; training optimizes $L_{total}=L_{CE}+\alpha L_{CA}+\beta L_{Reg}$. Contributions: 1) improves robustness against AutoAttack on TinyImageNet and 14 zero-shot datasets, 2) outperforms PMG-AFT and TGA-ZSR under PGD-100 and CW, with reduced memory usage, 3) demonstrates strong generalization across diverse datasets. Significance: enables safer deployment of vision-language models by achieving higher robust and clean accuracy with lower memory footprints.

Abstract

Vision-language models like CLIP demonstrate impressive zero-shot generalization but remain highly vulnerable to adversarial attacks. In this work, we propose Confidence-Aware Weighting (CAW) to enhance zero-shot robustness in vision-language models. CAW consists of two components: (1) a Confidence-Aware loss that prioritizes uncertain adversarial examples by scaling the KL divergence between clean and adversarial predictions, and (2) a feature alignment regularization that preserves semantic consistency by minimizing the distance between frozen and fine-tuned image encoder features on adversarial inputs. These components work jointly to improve both clean and robust accuracy without sacrificing generalization. Extensive experiments on TinyImageNet and 14 additional datasets show that CAW outperforms recent methods such as PMG-AFT and TGA-ZSR under strong attacks like AutoAttack, while using less memory.

Zero-Shot Robustness of Vision Language Models Via Confidence-Aware Weighting

TL;DR

Problem: zero-shot robustness of vision-language models (e.g., CLIP) under imperceptible adversarial perturbations. Approach: CAW combines a Confidence-Aware loss, defined via

with

and

and KL divergence, and a Regularization term

that aligns adversarial image features from the fine-tuned and frozen encoders; training optimizes

. Contributions: 1) improves robustness against AutoAttack on TinyImageNet and 14 zero-shot datasets, 2) outperforms PMG-AFT and TGA-ZSR under PGD-100 and CW, with reduced memory usage, 3) demonstrates strong generalization across diverse datasets. Significance: enables safer deployment of vision-language models by achieving higher robust and clean accuracy with lower memory footprints.

Zero-Shot Robustness of Vision Language Models Via Confidence-Aware Weighting

TL;DR

Abstract

Zero-Shot Robustness of Vision Language Models Via Confidence-Aware Weighting

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)