Table of Contents
Fetching ...

FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models

Yanting Wang, Wei Zou, Jinyuan Jia

TL;DR

FCert provides the first certifiably robust defense against data-poisoning in few-shot classification that leverages high-quality foundation-model features. It introduces a robust distance, computed by discarding a fraction of extreme per-class distances, and proves a tight certified poisoning size $T^*$ via upper and lower bounds with a binary-search procedure. Empirical evaluation across vision datasets and NLP experiments with CLIP, DINOv2, PaLM-2, and OpenAI confirms that FCert preserves accuracy without attacks, outperforms existing certified defenses under poisoning, and remains computationally efficient. This work enables reliable deployment of few-shot classifiers in security-critical settings by providing formal robustness guarantees without sacrificing practical performance.

Abstract

Few-shot classification with foundation models (e.g., CLIP, DINOv2, PaLM-2) enables users to build an accurate classifier with a few labeled training samples (called support samples) for a classification task. However, an attacker could perform data poisoning attacks by manipulating some support samples such that the classifier makes the attacker-desired, arbitrary prediction for a testing input. Empirical defenses cannot provide formal robustness guarantees, leading to a cat-and-mouse game between the attacker and defender. Existing certified defenses are designed for traditional supervised learning, resulting in sub-optimal performance when extended to few-shot classification. In our work, we propose FCert, the first certified defense against data poisoning attacks to few-shot classification. We show our FCert provably predicts the same label for a testing input under arbitrary data poisoning attacks when the total number of poisoned support samples is bounded. We perform extensive experiments on benchmark few-shot classification datasets with foundation models released by OpenAI, Meta, and Google in both vision and text domains. Our experimental results show our FCert: 1) maintains classification accuracy without attacks, 2) outperforms existing state-of-the-art certified defenses for data poisoning attacks, and 3) is efficient and general.

FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models

TL;DR

FCert provides the first certifiably robust defense against data-poisoning in few-shot classification that leverages high-quality foundation-model features. It introduces a robust distance, computed by discarding a fraction of extreme per-class distances, and proves a tight certified poisoning size via upper and lower bounds with a binary-search procedure. Empirical evaluation across vision datasets and NLP experiments with CLIP, DINOv2, PaLM-2, and OpenAI confirms that FCert preserves accuracy without attacks, outperforms existing certified defenses under poisoning, and remains computationally efficient. This work enables reliable deployment of few-shot classifiers in security-critical settings by providing formal robustness guarantees without sacrificing practical performance.

Abstract

Few-shot classification with foundation models (e.g., CLIP, DINOv2, PaLM-2) enables users to build an accurate classifier with a few labeled training samples (called support samples) for a classification task. However, an attacker could perform data poisoning attacks by manipulating some support samples such that the classifier makes the attacker-desired, arbitrary prediction for a testing input. Empirical defenses cannot provide formal robustness guarantees, leading to a cat-and-mouse game between the attacker and defender. Existing certified defenses are designed for traditional supervised learning, resulting in sub-optimal performance when extended to few-shot classification. In our work, we propose FCert, the first certified defense against data poisoning attacks to few-shot classification. We show our FCert provably predicts the same label for a testing input under arbitrary data poisoning attacks when the total number of poisoned support samples is bounded. We perform extensive experiments on benchmark few-shot classification datasets with foundation models released by OpenAI, Meta, and Google in both vision and text domains. Our experimental results show our FCert: 1) maintains classification accuracy without attacks, 2) outperforms existing state-of-the-art certified defenses for data poisoning attacks, and 3) is efficient and general.
Paper Structure (29 sections, 2 theorems, 15 equations, 14 figures, 2 tables, 3 algorithms)

This paper contains 29 sections, 2 theorems, 15 equations, 14 figures, 2 tables, 3 algorithms.

Key Result

Theorem 1

Suppose we have a clean support set $\mathcal{D}$ for $C$-way-$K$-shot classification, where $\mathcal{D}^c \subset \mathcal{D}$ is a subset of $K$ support samples in $\mathcal{D}$ whose labels are $c$ ($c=1,2,\cdots, C$). Given a foundation model $g$ and a distance metric $Dist$, we denote by $d_i^ where $\mathcal{M}(\bm{x}_{test};\mathcal{D})$ represents the predicted label of our FCert for $\bm

Figures (14)

  • Figure 1: Illustration of few-shot classification with a foundation model using a linear classifier. An attacker could manipulate the classification boundary of the linear classifier by poisoning one support sample. The testing input is correctly classified as "dog" before attack and is misclassified as "cat" after attack.
  • Figure 2: Overview of FCert under Individual Attack. We have three support samples for each of the two classes (i.e., $2$-way-$3$-shot classification). An attacker could poison one support sample for each class, where the feature vectors with red color are for poisoned support samples. ${\color{red}d_1^{cat}}, d_2^{cat}, d_3^{cat}$ (or $d_1^{dog}, d_2^{dog}, {\color{red}d_3^{dog}}$) are distances between the feature vectors of three support samples whose labels are "cat" (or "dog") and the testing input, which are used to compute two robust distances $R^{cat}$ and $R^{dog}$. Our FCert still predicts the correct label "dog" for the testing input under two poisoned support samples.
  • Figure 3: Comparing the certified accuracy of FCert with existing provable defenses (or empirical accuracy of existing few-shot learning methods) for $C$-way-$K$-shot few-shot classification with CLIP. The attack type is individual attack. $K$ = 5, $C$ = 5 (first row); $K$ = 10, $C$ = 10 (second row); $K$ = 15, $C$ = 15 (third row). $T$ is poisoning size.
  • Figure 4: Comparing the certified accuracy of FCert with existing provable defenses (or empirical accuracy of existing few-shot learning methods) for $C$-way-$K$-shot few-shot classification with DINOv2. The attack type is individual attack. $K$ = 5, $C$ = 5 (first row); $K$ = 10, $C$ = 10 (second row); $K$ = 15, $C$ = 15 (third row). $T$ is poisoning size.
  • Figure 5: Comparing the certified accuracy of FCert with existing provable defenses (or empirical accuracy of exisiting few-shot learning methods) for $C$-way-$K$-shot few-shot classification with CLIP. The attack type is group attack. $K$ = 5, $C$ = 5 (first row); $K$ = 10, $C$ = 10 (second row); $K$ = 15, $C$ = 15 (third row). $T$ is poisoning size.
  • ...and 9 more figures

Theorems & Definitions (3)

  • Example 1
  • Theorem 1: Certified Poisoning Size
  • Theorem 2: Tightness of Our Certified Poisoning Size