Table of Contents
Fetching ...

Beyond Pixels: Semantic-aware Typographic Attack for Geo-Privacy Protection

Jiayi Zhu, Yihao Huang, Yue Cao, Xiaojun Jia, Qing Guo, Felix Juefei-Xu, Geguang Pu, Bin Wang

TL;DR

This work tackles geo-privacy leakage from LVLMs by introducing GeoSTA, a semantic-aware typographic attack that extends text outside the image to mislead geolocation inference without altering visual content. GeoSTA operates in two stages: first selecting a deceptive target location and constructing an instructional text, then using feedback from the target LVLM to generate an explanatory refinement that reconciles visuals with the text claim. The method relies on a coordination between an attack model and a black-box target model via three prompts, enabling semantically coherent yet deceptive text extensions. Across IconicLandmark, GoogleLandmark, and StreetView, and five commercial LVLMs, GeoSTA achieves near-perfect attack success rates, outperforming both typography-based and noise-based baselines and offering a practical, visually-preserving privacy protection tool.

Abstract

Large Visual Language Models (LVLMs) now pose a serious yet overlooked privacy threat, as they can infer a social media user's geolocation directly from shared images, leading to unintended privacy leakage. While adversarial image perturbations provide a potential direction for geo-privacy protection, they require relatively strong distortions to be effective against LVLMs, which noticeably degrade visual quality and diminish an image's value for sharing. To overcome this limitation, we identify typographical attacks as a promising direction for protecting geo-privacy by adding text extension outside the visual content. We further investigate which textual semantics are effective in disrupting geolocation inference and design a two-stage, semantics-aware typographical attack that generates deceptive text to protect user privacy. Extensive experiments across three datasets demonstrate that our approach significantly reduces geolocation prediction accuracy of five state-of-the-art commercial LVLMs, establishing a practical and visually-preserving protection strategy against emerging geo-privacy threats.

Beyond Pixels: Semantic-aware Typographic Attack for Geo-Privacy Protection

TL;DR

This work tackles geo-privacy leakage from LVLMs by introducing GeoSTA, a semantic-aware typographic attack that extends text outside the image to mislead geolocation inference without altering visual content. GeoSTA operates in two stages: first selecting a deceptive target location and constructing an instructional text, then using feedback from the target LVLM to generate an explanatory refinement that reconciles visuals with the text claim. The method relies on a coordination between an attack model and a black-box target model via three prompts, enabling semantically coherent yet deceptive text extensions. Across IconicLandmark, GoogleLandmark, and StreetView, and five commercial LVLMs, GeoSTA achieves near-perfect attack success rates, outperforming both typography-based and noise-based baselines and offering a practical, visually-preserving privacy protection tool.

Abstract

Large Visual Language Models (LVLMs) now pose a serious yet overlooked privacy threat, as they can infer a social media user's geolocation directly from shared images, leading to unintended privacy leakage. While adversarial image perturbations provide a potential direction for geo-privacy protection, they require relatively strong distortions to be effective against LVLMs, which noticeably degrade visual quality and diminish an image's value for sharing. To overcome this limitation, we identify typographical attacks as a promising direction for protecting geo-privacy by adding text extension outside the visual content. We further investigate which textual semantics are effective in disrupting geolocation inference and design a two-stage, semantics-aware typographical attack that generates deceptive text to protect user privacy. Extensive experiments across three datasets demonstrate that our approach significantly reduces geolocation prediction accuracy of five state-of-the-art commercial LVLMs, establishing a practical and visually-preserving protection strategy against emerging geo-privacy threats.

Paper Structure

This paper contains 16 sections, 4 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Overview of geo-privacy leakage (left) and protection achieved by our proposed GeoSTA (right).
  • Figure 2: Influence of target location selection on geo-privacy protection.
  • Figure 3: Effect of instructional enhancement on geo-privacy protection.
  • Figure 4: Effect of explanatory statement on geo-privacy protection.
  • Figure 5: Framework of our two-stage typographic attack GeoSTA.
  • ...and 1 more figures