Table of Contents
Fetching ...

BADTV: Unveiling Backdoor Threats in Third-Party Task Vectors

Chia-Yi Hsu, Yu-Lin Tsai, Yu Zhe, Yan-Lun Chen, Chih-Hsun Lin, Chia-Mu Yu, Yang Zhang, Chun-Ying Huang, Jun Sakuma

TL;DR

This work identifies a novel security risk in Task Vector as a Service (TVaaS) by introducing BadTV, a composite backdoor crafted to remain effective under all primary task arithmetic operations—learning via addition, forgetting via subtraction, and analogy via mixed operations. BadTV uses asymmetric backdoors (b1 and b2) to sustain malicious influence when merged with a pre-trained model, achieving high attack success rates across CLIP-based classifiers and even extending to LLM settings, while preserving normal performance on clean data. Extensive experiments across datasets (MNIST, SVHN, CIFAR, GTSRB) and models (CLIP variants, ConvNeXt, Llama-2-Chat) show BadTV’s robustness under various configurations, including multiple clean task vectors and task analogies, and reveal that existing defenses fail to detect these backdoors. The findings underscore the urgency of robust defenses for TVaaS deployments, as backdoors can compound risks in real-world use cases such as app-like TV stores and cross-task adaptations. The work contributes a concrete threat model, empirical evidence of persistent backdoors in TA, and a call for defense strategies tailored to TV-based architectures and outputs.

Abstract

Task arithmetic in large-scale pre-trained models enables agile adaptation to diverse downstream tasks without extensive retraining. By leveraging task vectors (TVs), users can perform modular updates through simple arithmetic operations like addition and subtraction. Yet, this flexibility presents new security challenges. In this paper, we investigate how TVs are vulnerable to backdoor attacks, revealing how malicious actors can exploit them to compromise model integrity. By creating composite backdoors that are designed asymmetrically, we introduce BadTV, a backdoor attack specifically crafted to remain effective simultaneously under task learning, forgetting, and analogy operations. Extensive experiments show that BadTV achieves near-perfect attack success rates across diverse scenarios, posing a serious threat to models relying on task arithmetic. We also evaluate current defenses, finding they fail to detect or mitigate BadTV. Our results highlight the urgent need for robust countermeasures to secure TVs in real-world deployments.

BADTV: Unveiling Backdoor Threats in Third-Party Task Vectors

TL;DR

This work identifies a novel security risk in Task Vector as a Service (TVaaS) by introducing BadTV, a composite backdoor crafted to remain effective under all primary task arithmetic operations—learning via addition, forgetting via subtraction, and analogy via mixed operations. BadTV uses asymmetric backdoors (b1 and b2) to sustain malicious influence when merged with a pre-trained model, achieving high attack success rates across CLIP-based classifiers and even extending to LLM settings, while preserving normal performance on clean data. Extensive experiments across datasets (MNIST, SVHN, CIFAR, GTSRB) and models (CLIP variants, ConvNeXt, Llama-2-Chat) show BadTV’s robustness under various configurations, including multiple clean task vectors and task analogies, and reveal that existing defenses fail to detect these backdoors. The findings underscore the urgency of robust defenses for TVaaS deployments, as backdoors can compound risks in real-world use cases such as app-like TV stores and cross-task adaptations. The work contributes a concrete threat model, empirical evidence of persistent backdoors in TA, and a call for defense strategies tailored to TV-based architectures and outputs.

Abstract

Task arithmetic in large-scale pre-trained models enables agile adaptation to diverse downstream tasks without extensive retraining. By leveraging task vectors (TVs), users can perform modular updates through simple arithmetic operations like addition and subtraction. Yet, this flexibility presents new security challenges. In this paper, we investigate how TVs are vulnerable to backdoor attacks, revealing how malicious actors can exploit them to compromise model integrity. By creating composite backdoors that are designed asymmetrically, we introduce BadTV, a backdoor attack specifically crafted to remain effective simultaneously under task learning, forgetting, and analogy operations. Extensive experiments show that BadTV achieves near-perfect attack success rates across diverse scenarios, posing a serious threat to models relying on task arithmetic. We also evaluate current defenses, finding they fail to detect or mitigate BadTV. Our results highlight the urgent need for robust countermeasures to secure TVs in real-world deployments.
Paper Structure (55 sections, 2 equations, 24 figures, 7 tables)

This paper contains 55 sections, 2 equations, 24 figures, 7 tables.

Figures (24)

  • Figure 1: Task vector as a service (TVaaS).
  • Figure 2: The workflow of BadTV.
  • Figure 3: Visualization of traditional backdoor attack (BD) and ours under possible operations of TV. One can see that BadTV could work on both arithmetic scenarios while traditional BD would fail.
  • Figure 4: Comparison of results for the BTV on GTSRB trained with different backdoor attacks, combined with various CTVs.
  • Figure 5: Different backdoor combinations in BadTV. Under the same setting, both trigger $b_{1}$ and $b_{2}$ are set by the same attack. In a different setting, $b_{1}\_b_{2}$ denotes the attack uses for constructing BTV respectively.
  • ...and 19 more figures