SLANT: Spurious Logo ANalysis Toolkit
Maan Qraitem, Piotr Teterwak, Kate Saenko, Bryan A. Plummer
TL;DR
SLANT addresses spurious logo correlations in Vision-Language Foundation Models by introducing a semi-automatic toolkit that mines a comprehensive logo bank, CC12M-LogoBank, for logos that spuriously correlate with downstream targets. The method combines logo bank construction with a spuriousity metric to uncover logos that bias content moderation, object recognition, and adjective-based judgments, and it offers two non-training mitigations—10-crop augmentation and logo masking via OWLv2—that integrate with zero-shot inference. Empirical results show logos can cause harmful content to be classified as harmless, degrade ImageNet zero-shot accuracy, and amplify negative human associations, with mitigation providing partial relief. The work also outlines a realistic threat model for logo-based attacks and discusses ethical implications and future directions as logos continue to permeate online data.
Abstract
Online content is filled with logos, from ads and social media posts to website branding and product placements. Consequently, these logos are prevalent in the extensive web-scraped datasets used to pretrain Vision-Language Models, which are used for a wide array of tasks (content moderation, object classification). While these models have been shown to learn harmful correlations in various tasks, whether these correlations include logos remains understudied. Understanding this is especially important due to logos often being used by public-facing entities like brands and government agencies. To that end, we develop SLANT: A Spurious Logo ANalysis Toolkit. Our key finding is that some logos indeed lead to spurious incorrect predictions, for example, adding the Adidas logo to a photo of a person causes a model classify the person as greedy. SLANT contains a semi-automatic mechanism for mining such "spurious" logos. The mechanism consists of a comprehensive logo bank, CC12M-LogoBank, and an algorithm that searches the bank for logos that VLMs spuriously correlate with a user-provided downstream recognition target. We uncover various seemingly harmless logos that VL models correlate 1) with negative human adjectives 2) with the concept of `harmlessness'; causing models to misclassify harmful online content as harmless, and 3) with user-provided object concepts; causing lower recognition accuracy on ImageNet zero-shot classification. Furthermore, SLANT's logos can be seen as effective attacks against foundational models; an attacker could place a spurious logo on harmful content, causing the model to misclassify it as harmless. This threat is alarming considering the simplicity of logo attacks, increasing the attack surface of VL models. As a defense, we include in our Toolkit two effective mitigation strategies that seamlessly integrate with zero-shot inference of foundation models.
