Table of Contents
Fetching ...

SLANT: Spurious Logo ANalysis Toolkit

Maan Qraitem, Piotr Teterwak, Kate Saenko, Bryan A. Plummer

TL;DR

SLANT addresses spurious logo correlations in Vision-Language Foundation Models by introducing a semi-automatic toolkit that mines a comprehensive logo bank, CC12M-LogoBank, for logos that spuriously correlate with downstream targets. The method combines logo bank construction with a spuriousity metric to uncover logos that bias content moderation, object recognition, and adjective-based judgments, and it offers two non-training mitigations—10-crop augmentation and logo masking via OWLv2—that integrate with zero-shot inference. Empirical results show logos can cause harmful content to be classified as harmless, degrade ImageNet zero-shot accuracy, and amplify negative human associations, with mitigation providing partial relief. The work also outlines a realistic threat model for logo-based attacks and discusses ethical implications and future directions as logos continue to permeate online data.

Abstract

Online content is filled with logos, from ads and social media posts to website branding and product placements. Consequently, these logos are prevalent in the extensive web-scraped datasets used to pretrain Vision-Language Models, which are used for a wide array of tasks (content moderation, object classification). While these models have been shown to learn harmful correlations in various tasks, whether these correlations include logos remains understudied. Understanding this is especially important due to logos often being used by public-facing entities like brands and government agencies. To that end, we develop SLANT: A Spurious Logo ANalysis Toolkit. Our key finding is that some logos indeed lead to spurious incorrect predictions, for example, adding the Adidas logo to a photo of a person causes a model classify the person as greedy. SLANT contains a semi-automatic mechanism for mining such "spurious" logos. The mechanism consists of a comprehensive logo bank, CC12M-LogoBank, and an algorithm that searches the bank for logos that VLMs spuriously correlate with a user-provided downstream recognition target. We uncover various seemingly harmless logos that VL models correlate 1) with negative human adjectives 2) with the concept of `harmlessness'; causing models to misclassify harmful online content as harmless, and 3) with user-provided object concepts; causing lower recognition accuracy on ImageNet zero-shot classification. Furthermore, SLANT's logos can be seen as effective attacks against foundational models; an attacker could place a spurious logo on harmful content, causing the model to misclassify it as harmless. This threat is alarming considering the simplicity of logo attacks, increasing the attack surface of VL models. As a defense, we include in our Toolkit two effective mitigation strategies that seamlessly integrate with zero-shot inference of foundation models.

SLANT: Spurious Logo ANalysis Toolkit

TL;DR

SLANT addresses spurious logo correlations in Vision-Language Foundation Models by introducing a semi-automatic toolkit that mines a comprehensive logo bank, CC12M-LogoBank, for logos that spuriously correlate with downstream targets. The method combines logo bank construction with a spuriousity metric to uncover logos that bias content moderation, object recognition, and adjective-based judgments, and it offers two non-training mitigations—10-crop augmentation and logo masking via OWLv2—that integrate with zero-shot inference. Empirical results show logos can cause harmful content to be classified as harmless, degrade ImageNet zero-shot accuracy, and amplify negative human associations, with mitigation providing partial relief. The work also outlines a realistic threat model for logo-based attacks and discusses ethical implications and future directions as logos continue to permeate online data.

Abstract

Online content is filled with logos, from ads and social media posts to website branding and product placements. Consequently, these logos are prevalent in the extensive web-scraped datasets used to pretrain Vision-Language Models, which are used for a wide array of tasks (content moderation, object classification). While these models have been shown to learn harmful correlations in various tasks, whether these correlations include logos remains understudied. Understanding this is especially important due to logos often being used by public-facing entities like brands and government agencies. To that end, we develop SLANT: A Spurious Logo ANalysis Toolkit. Our key finding is that some logos indeed lead to spurious incorrect predictions, for example, adding the Adidas logo to a photo of a person causes a model classify the person as greedy. SLANT contains a semi-automatic mechanism for mining such "spurious" logos. The mechanism consists of a comprehensive logo bank, CC12M-LogoBank, and an algorithm that searches the bank for logos that VLMs spuriously correlate with a user-provided downstream recognition target. We uncover various seemingly harmless logos that VL models correlate 1) with negative human adjectives 2) with the concept of `harmlessness'; causing models to misclassify harmful online content as harmless, and 3) with user-provided object concepts; causing lower recognition accuracy on ImageNet zero-shot classification. Furthermore, SLANT's logos can be seen as effective attacks against foundational models; an attacker could place a spurious logo on harmful content, causing the model to misclassify it as harmless. This threat is alarming considering the simplicity of logo attacks, increasing the attack surface of VL models. As a defense, we include in our Toolkit two effective mitigation strategies that seamlessly integrate with zero-shot inference of foundation models.
Paper Structure (18 sections, 3 equations, 10 figures, 1 algorithm)

This paper contains 18 sections, 3 equations, 10 figures, 1 algorithm.

Figures (10)

  • Figure 1: Uncovering Spurious Logos with SLANT. SLANT uncovers spurious logos across three diverse visual recognition tasks: it reveals that foundation models spuriously correlate (a) the Motorolla Logo with predicting "harmless" which results in misclassifying hurtful content as harmless (b) the Adidas logo with the negative human adjectives (e.g. Greedy) and (c) a migrant education logo with "Traffic Light" leading to misclassifying parking meter.
  • Figure 2: Curating CC12M-LogoBank. An overview of how we construct the CC12M-LogoBank. We use the observation that logos are present as single images in web scale datasets like CC12M Changpinyo_2021_CVPR. Using this, we filter CC12M using CLIP radford2021learning and a set of prompts that reflect logos. Observe a set of samples from CC12M-LogoBank on the right. Refer to Section \ref{['sec:artifact_dataset']} for further discussion.
  • Figure 3: SLANT's Spurious Logos. We present several sample logos mined by SLANT that spuriously correlate with various visual tasks: (a) logos that spuriously correlate with predicting a hateful meme as harmless, (b) logos that spuriously correlate with four different ImageNet deng2009imagenet classes (Traffic Light, iPod, star fish, and parachute) (c) logos that spuriously correlate with four Negative Adjectives (Hostile, Arrogant, Cruel, Criminal). Refer to Section \ref{['sec:mining_alg']} for further discussion.
  • Figure 4: SLANT's Spurious Logos Break Content Moderation Systems. Performance of CLIP-based content moderation systems: (a) Sum burbi2023mapping, (b) Hate-CLIPper kumar2022hate and (c) ISSUES burbi2023mapping on the Hateful Meme Classification benchmark kiela2020hateful when we use spurious logos that correlate with the class "harmless". We report the accuracy and True Positive (Hateful) Rate as we increase the number of pasted spurious logos from 0 to 4. The logos are effective at reducing accuracy of the hateful classifiers (blue line). Furthermore, the True Positive Rate in some cases approaches zero in some cases (blue line). Refer to Section \ref{['sec:logos_against_hmc']} for further discussion.
  • Figure 5: SLANT's Spurious Logos Disrupt ImageNet Classification Accuracy. Performance of CLIP radford2021learning on ImageNet deng2009imagenet when we use different logos that spuriously correlate with four different ImageNet classes (a) Traffic Light, (b) iPod, (c) Starfish, and (d) Parachute. We report the accuracy as we increase the number of pasted spurious logos from 0 to 4. We report the total zero shot accuracy as well as the precision of the targeted class (e.g. Traffic Light) in each case. Note how as we increase the number pasted logos, the total zero shot accuracy significantly decreases as well as the precision score which approaches zero in some cases (blue line). Refer to Section \ref{['sec:logos_against_image_net']} for further discussion.
  • ...and 5 more figures