Typographic Attacks in a Multi-Image Setting
Xiaomeng Wang, Zhengyu Zhao, Martha Larson
TL;DR
This work studies typographic attacks on Large Vision-Language Models in a realistic multi-image setting where attack texts must be non-repetitive across an image set. It demonstrates that text-image similarity, computed in the embedding space, is a strong predictor of attack success and motivates two strategy families: text-image similarity–based and attack-text-effectiveness–based approaches, evaluated under non-repeating, one-to-one matching. The study shows that text-image similarity strategies substantially outperform random baselines (e.g., up to a 21% gain on ImageNet with CLIP) and maintain stealth by avoiding repeated attack texts; these results transfer to other LVLMs such as InstructBLIP in greybox settings. The findings highlight embedding-space vulnerabilities in LVLMs and provide a framework for evaluating defenses, while enabling future work on more naturalistic attack texts and broader model coverage.
Abstract
Large Vision-Language Models (LVLMs) are susceptible to typographic attacks, which are misclassifications caused by an attack text that is added to an image. In this paper, we introduce a multi-image setting for studying typographic attacks, broadening the current emphasis of the literature on attacking individual images. Specifically, our focus is on attacking image sets without repeating the attack query. Such non-repeating attacks are stealthier, as they are more likely to evade a gatekeeper than attacks that repeat the same attack text. We introduce two attack strategies for the multi-image setting, leveraging the difficulty of the target image, the strength of the attack text, and text-image similarity. Our text-image similarity approach improves attack success rates by 21% over random, non-specific methods on the CLIP model using ImageNet while maintaining stealth in a multi-image scenario. An additional experiment demonstrates transferability, i.e., text-image similarity calculated using CLIP transfers when attacking InstructBLIP.
