Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks

Maan Qraitem; Nazia Tasnim; Piotr Teterwak; Kate Saenko; Bryan A. Plummer

Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks

Maan Qraitem, Nazia Tasnim, Piotr Teterwak, Kate Saenko, Bryan A. Plummer

TL;DR

This work investigates typographic attacks on large vision-language models (LVLMs) and introduces Self-Generated Typographic Attacks that use the models themselves to craft deceptive text. It presents two attack families—Class-Based Attacks and Reasoned Attacks—that exploit visual similarity and language reasoning, respectively. Empirical results show these attacks can markedly degrade LVLM performance (up to ~60% drops) across multiple models and datasets, with Reasoned Attacks often delivering the strongest effects on LVLMs like GPT-4V. The findings highlight a critical vulnerability in LVLMs' reliance on textual cues and language understanding, underscoring the need for defenses and further evaluation across diverse models and domains.

Abstract

Typographic attacks, adding misleading text to images, can deceive vision-language models (LVLMs). The susceptibility of recent large LVLMs like GPT4-V to such attacks is understudied, raising concerns about amplified misinformation in personal assistant applications. Previous attacks use simple strategies, such as random misleading words, which don't fully exploit LVLMs' language reasoning abilities. We introduce an experimental setup for testing typographic attacks on LVLMs and propose two novel self-generated attacks: (1) Class-based attacks, where the model identifies a similar class to deceive itself, and (2) Reasoned attacks, where an advanced LVLM suggests an attack combining a deceiving class and description. Our experiments show these attacks significantly reduce classification performance by up to 60\% and are effective across different models, including InstructBLIP and MiniGPT4. Code: https://github.com/mqraitem/Self-Gen-Typo-Attack

Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks

TL;DR

Abstract

Paper Structure (15 sections, 1 equation, 8 figures, 2 tables)

This paper contains 15 sections, 1 equation, 8 figures, 2 tables.

Introduction
Related Work
Problem Definition
Self Generated Typographic Attacks
Self Generated Class-Based Attacks
Self Generated Reasoned Attacks
Typographic Attacks against Large Vision Language Models
Experiments
Results
Qualitative Examples
Ablations
Conclusion
Results for each dataset
Results for each dataset
Attack Location

Figures (8)

Figure 1: Typographic Attack Comaprison. (a) Prior work's typographic attacks (which were designed for CLIP) randomly samples a deceiving class from the dataset's categories to attack the Large Vision Language Model (LVLM) azuma2023defense. (b) Shows our more effective Self-Generated attack which uses the LVLM itself to generate the attack.
Figure 2: Self-Generated Attacks Comparison. Overview of the two types of our Self-Generated Attacks: (a) Class Based and (b) Reasoned Attacks. Class based attacks prompt a generative Large Vision Language Model (LVLM) (e.g. LLaVA liu2023llava) or text-image similarity based VL models (e.g. CLIP radford2021learning) about the most similar class to the ground truth and use that as an attack. Reasoned attacks prompt a generative LVLM to recommend an effective attack. The LVLM returns then a deceiving class and a reasoning.
Figure 3: The prompt used to generate the Reasoned typographic attack with GPT-4V yang2023dawn. {dataset subject} refers to the category of the dataset classes (e.g., Car Model for Cars). Refer to Section \ref{['sec:self-generated']} for further details.
Figure 4: Qualitative examples where our Reasoned attacks (Column 3) cause the model to misclassify the mage while our Class Based Attacks (Column 2) and Random Class Attacks azuma2023defense fail to do so on the five datasets used in our experimental setup, namely: The Aircraft dataset maji2013fine (Row 1), The StanfordCars dataset krause20133d (Row 2), The Flowers dataset nilsback2008automated (Row 3), The OxfordPets dataset parkhi2012cats (Row 4), and Food101 dataset bossard2014food (Row 5). Refer to Section \ref{['sec-apdx:qual_examples']} for Discussion.
Figure 5: Comparing the Avg % Accuracy drop of our Reasoned Attack on 1) GPT4-Vyang2023dawn 2) MiniGPT-4 zhu2023minigpt 3) InstructBLIP dai2023instructblip and 4) LLaVA1.5 liu2023llava on the five datasets that comprise our experimental setup 1) Flowers 2) Food101 3) Cars 4) Pets and 5) Aircraft. Refer to Section \ref{['sec:ablations_results']} for further discussion.
...and 3 more figures

Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks

TL;DR

Abstract

Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks

Authors

TL;DR

Abstract

Table of Contents

Figures (8)