As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?
Anjun Hu, Jindong Gu, Francesco Pinto, Konstantinos Kamnitsas, Philip Torr
TL;DR
This work investigates whether open-source foundation models like CLIP propagate adversarial vulnerabilities to downstream vision-language tasks. It introduces Patch Representation Misalignment (PRM), a cross-task attack that perturbations input to distort intermediate CLIP representations via a patch-wise cosine-similarity objective, formalized as $L_{PRM} = \sum_{l\in L} \sum_{p=0}^{\lceil HW/d^2 \rceil} \frac{f^p_l \cdot f'^p_l}{\|f^p_l\| \|f'^p_l\|}$. Using only publicly available CLIP vision encoders as surrogates, PRM yields substantial transfer to more than 20 downstream models across four tasks (OVS, OVD, IC, VQA), outperforming task-specific and cross-task baselines. These results reveal a significant safety risk: foundation-model-based vulnerabilities can propagate to diverse downstream systems, underscoring the need for defense strategies and robust training approaches in open-source foundation-model deployments.
Abstract
Foundation models pre-trained on web-scale vision-language data, such as CLIP, are widely used as cornerstones of powerful machine learning systems. While pre-training offers clear advantages for downstream learning, it also endows downstream models with shared adversarial vulnerabilities that can be easily identified through the open-sourced foundation model. In this work, we expose such vulnerabilities in CLIP's downstream models and show that foundation models can serve as a basis for attacking their downstream systems. In particular, we propose a simple yet effective adversarial attack strategy termed Patch Representation Misalignment (PRM). Solely based on open-sourced CLIP vision encoders, this method produces adversaries that simultaneously fool more than 20 downstream models spanning 4 common vision-language tasks (semantic segmentation, object detection, image captioning and visual question-answering). Our findings highlight the concerning safety risks introduced by the extensive usage of public foundational models in the development of downstream systems, calling for extra caution in these scenarios.
