Table of Contents
Fetching ...

Document Screenshot Retrievers are Vulnerable to Pixel Poisoning Attacks

Shengyao Zhuang, Ekaterina Khramtsova, Xueguang Ma, Bevan Koopman, Jimmy Lin, Guido Zuccon

TL;DR

This work addresses the vulnerability of vision-language model–based document screenshot retrievers to pixel-level adversarial attacks. It introduces three gradient-based attack methods—Direct Optimisation, Noise Optimisation, and Mask Direct Optimisation—to manipulate screenshot embeddings and poison rankings. The results show that injecting a single adversarial screenshot can significantly degrade top-k results, notably 41.9% for DSE and 26.4% for ColPali in in-domain tests, with stronger effects when target queries are known or multiple adversarial instances are injected. These findings have practical implications for corpus poisoning and SEO, highlighting the need for defenses and more robust architectures in deployment of VLM-based retrieval systems.

Abstract

Recent advancements in dense retrieval have introduced vision-language model (VLM)-based retrievers, such as DSE and ColPali, which leverage document screenshots embedded as vectors to enable effective search and offer a simplified pipeline over traditional text-only methods. In this study, we propose three pixel poisoning attack methods designed to compromise VLM-based retrievers and evaluate their effectiveness under various attack settings and parameter configurations. Our empirical results demonstrate that injecting even a single adversarial screenshot into the retrieval corpus can significantly disrupt search results, poisoning the top-10 retrieved documents for 41.9% of queries in the case of DSE and 26.4% for ColPali. These vulnerability rates notably exceed those observed with equivalent attacks on text-only retrievers. Moreover, when targeting a small set of known queries, the attack success rate raises, achieving complete success in certain cases. By exposing the vulnerabilities inherent in vision-language models, this work highlights the potential risks associated with their deployment.

Document Screenshot Retrievers are Vulnerable to Pixel Poisoning Attacks

TL;DR

This work addresses the vulnerability of vision-language model–based document screenshot retrievers to pixel-level adversarial attacks. It introduces three gradient-based attack methods—Direct Optimisation, Noise Optimisation, and Mask Direct Optimisation—to manipulate screenshot embeddings and poison rankings. The results show that injecting a single adversarial screenshot can significantly degrade top-k results, notably 41.9% for DSE and 26.4% for ColPali in in-domain tests, with stronger effects when target queries are known or multiple adversarial instances are injected. These findings have practical implications for corpus poisoning and SEO, highlighting the need for defenses and more robust architectures in deployment of VLM-based retrieval systems.

Abstract

Recent advancements in dense retrieval have introduced vision-language model (VLM)-based retrievers, such as DSE and ColPali, which leverage document screenshots embedded as vectors to enable effective search and offer a simplified pipeline over traditional text-only methods. In this study, we propose three pixel poisoning attack methods designed to compromise VLM-based retrievers and evaluate their effectiveness under various attack settings and parameter configurations. Our empirical results demonstrate that injecting even a single adversarial screenshot into the retrieval corpus can significantly disrupt search results, poisoning the top-10 retrieved documents for 41.9% of queries in the case of DSE and 26.4% for ColPali. These vulnerability rates notably exceed those observed with equivalent attacks on text-only retrievers. Moreover, when targeting a small set of known queries, the attack success rate raises, achieving complete success in certain cases. By exposing the vulnerabilities inherent in vision-language models, this work highlights the potential risks associated with their deployment.

Paper Structure

This paper contains 21 sections, 6 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Direct Optimisation. Left: original image, middle: 10% gradient is updated; right: 100% gradient is updated.
  • Figure 2: Noise Optimisation. Left: original image, middle: 10% gradient is updated; right: 100% gradient is updated.
  • Figure 3: Mask Optimisation. Left: original image, middle: 5% mask; right: 20% mask.
  • Figure 4: Impact of gradient optimisation percentage $p$ on attack effectiveness over target queries. Lower percentages of optimised gradient $p$ result in less visual corruption of the document screenshot.
  • Figure 5: Impact of mask area $a$ on attack effectiveness over target queries. Lower percentages of mask area $a$ result in less visual corruption of the document screenshot.
  • ...and 3 more figures