Table of Contents
Fetching ...

On the Adversarial Risk of Test Time Adaptation: An Investigation into Realistic Test-Time Data Poisoning

Yongyi Su, Yushu Li, Nanqing Liu, Kui Jia, Xulei Yang, Chuan-Sheng Foo, Xun Xu

TL;DR

The paper tackles the problem of adversarial risk in test-time adaptation by proposing Realistic Test-Time Data Poisoning (RTTDP), a grey-box, online, data-poisoning protocol that avoids access to benign data and online model weights. It introduces a surrogate-model distillation mechanism and an in-distribution attack objective with feature distribution regularization to craft poisoned data that generalizes to benign samples under RTTDP. Two TTA-specific poisoning objectives, Notch High Entropy (NHE) and Balanced Low Entropy (BLE), are proposed to exploit self-training dynamics, and extensive experiments across CIFAR-10/100-C and ImageNet-C demonstrate the effectiveness and limitations of RTTDP against state-of-the-art TTA methods, alongside defense strategies such as entropy-thresholding and EMA. The findings indicate that prior claims of catastrophic vulnerability may be overstated under realistic constraints, while also providing concrete guidelines for designing adversarially robust TTA methods and defenses with practical impact for deployment in cloud-based services.

Abstract

Test-time adaptation (TTA) updates the model weights during the inference stage using testing data to enhance generalization. However, this practice exposes TTA to adversarial risks. Existing studies have shown that when TTA is updated with crafted adversarial test samples, also known as test-time poisoned data, the performance on benign samples can deteriorate. Nonetheless, the perceived adversarial risk may be overstated if the poisoned data is generated under overly strong assumptions. In this work, we first review realistic assumptions for test-time data poisoning, including white-box versus grey-box attacks, access to benign data, attack order, and more. We then propose an effective and realistic attack method that better produces poisoned samples without access to benign samples, and derive an effective in-distribution attack objective. We also design two TTA-aware attack objectives. Our benchmarks of existing attack methods reveal that the TTA methods are more robust than previously believed. In addition, we analyze effective defense strategies to help develop adversarially robust TTA methods. The source code is available at https://github.com/Gorilla-Lab-SCUT/RTTDP.

On the Adversarial Risk of Test Time Adaptation: An Investigation into Realistic Test-Time Data Poisoning

TL;DR

The paper tackles the problem of adversarial risk in test-time adaptation by proposing Realistic Test-Time Data Poisoning (RTTDP), a grey-box, online, data-poisoning protocol that avoids access to benign data and online model weights. It introduces a surrogate-model distillation mechanism and an in-distribution attack objective with feature distribution regularization to craft poisoned data that generalizes to benign samples under RTTDP. Two TTA-specific poisoning objectives, Notch High Entropy (NHE) and Balanced Low Entropy (BLE), are proposed to exploit self-training dynamics, and extensive experiments across CIFAR-10/100-C and ImageNet-C demonstrate the effectiveness and limitations of RTTDP against state-of-the-art TTA methods, alongside defense strategies such as entropy-thresholding and EMA. The findings indicate that prior claims of catastrophic vulnerability may be overstated under realistic constraints, while also providing concrete guidelines for designing adversarially robust TTA methods and defenses with practical impact for deployment in cloud-based services.

Abstract

Test-time adaptation (TTA) updates the model weights during the inference stage using testing data to enhance generalization. However, this practice exposes TTA to adversarial risks. Existing studies have shown that when TTA is updated with crafted adversarial test samples, also known as test-time poisoned data, the performance on benign samples can deteriorate. Nonetheless, the perceived adversarial risk may be overstated if the poisoned data is generated under overly strong assumptions. In this work, we first review realistic assumptions for test-time data poisoning, including white-box versus grey-box attacks, access to benign data, attack order, and more. We then propose an effective and realistic attack method that better produces poisoned samples without access to benign samples, and derive an effective in-distribution attack objective. We also design two TTA-aware attack objectives. Our benchmarks of existing attack methods reveal that the TTA methods are more robust than previously believed. In addition, we analyze effective defense strategies to help develop adversarially robust TTA methods. The source code is available at https://github.com/Gorilla-Lab-SCUT/RTTDP.
Paper Structure (32 sections, 14 equations, 4 figures, 12 tables, 1 algorithm)

This paper contains 32 sections, 14 equations, 4 figures, 12 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of the proposed Realistic Test-Time Data Poisoning (RTTDP) pipeline. $\mathcal{B}_{ab}$ indicates the adversary benign subset, and $\mathcal{B}_a$ indicates the adversary poisoned subset where the samples are poisoned from the clean samples in $\mathcal{B}_{ab}$. $\mathcal{B}_b$ indicates the benign users' subset where the samples are used to validate the adversarial risk of TTA pipeline and these samples cannot be access by the adversary. Adversary generates poisoned data by attacking a regularized objective without accessing benign samples from other users. Model is attacked when carrying out TTA on testing data stream mixed with benign and poisoned data.
  • Figure 2: (a) The attack performance comparison about the poisoned samples generated on Source (pretrained) Model, our proposed Surrogate Model and Online target Model (white box). (b) The T-SNE visualization of the feature points (before FC layer). Without $\mathcal{L}_{reg}$, common attack losses (e.g. maximizing cross-entropy) produce poisoned samples (orange dots) that are far from benign ones (blue dots), leading to less effective attacks. (c) The attack performance comparison between w.o. and w. $\mathcal{L}_{reg}$. (d) The average prediction entropy of the poisoned samples generated by our proposed two different attack objectives, respectively.
  • Figure 3: Illustration of test-time data poisoning batch split.
  • Figure 4: Visualizing of selected samples before and after test-time data poisoning.