Table of Contents
Fetching ...

GeoShield: Safeguarding Geolocation Privacy from Vision-Language Models via Adversarial Perturbations

Xinwei Liu, Xiaojun Jia, Yuan Xun, Simeng Qin, Xiaochun Cao

TL;DR

Vision-Language Models can reveal precise geolocations from shared images, raising privacy concerns. GeoShield introduces a three-module framework—Geographical and Non-Geographical Feature Disentanglement (GNFD), Geographical Exposure Element Identification (Geo-EE), and Perturbation Scale Adaptive Enhancement (PSAE)—to robustly perturb high-resolution images while preserving semantic content. The method demonstrates superior geoprivacy protection in black-box settings across Google Street View and Im2GPS3k against multiple VLMs and open-source models, with theoretical analyses supporting feature disentanglement, convergence, and privacy guarantees. Practical prompts and extensive experiments indicate GeoShield’s viability for real-world deployment and its potential as a baseline for broader privacy protections against advanced multimodal models.

Abstract

Vision-Language Models (VLMs) such as GPT-4o now demonstrate a remarkable ability to infer users' locations from public shared images, posing a substantial risk to geoprivacy. Although adversarial perturbations offer a potential defense, current methods are ill-suited for this scenario: they often perform poorly on high-resolution images and low perturbation budgets, and may introduce irrelevant semantic content. To address these limitations, we propose GeoShield, a novel adversarial framework designed for robust geoprivacy protection in real-world scenarios. GeoShield comprises three key modules: a feature disentanglement module that separates geographical and non-geographical information, an exposure element identification module that pinpoints geo-revealing regions within an image, and a scale-adaptive enhancement module that jointly optimizes perturbations at both global and local levels to ensure effectiveness across resolutions. Extensive experiments on challenging benchmarks show that GeoShield consistently surpasses prior methods in black-box settings, achieving strong privacy protection with minimal impact on visual or semantic quality. To our knowledge, this work is the first to explore adversarial perturbations for defending against geolocation inference by advanced VLMs, providing a practical and effective solution to escalating privacy concerns.

GeoShield: Safeguarding Geolocation Privacy from Vision-Language Models via Adversarial Perturbations

TL;DR

Vision-Language Models can reveal precise geolocations from shared images, raising privacy concerns. GeoShield introduces a three-module framework—Geographical and Non-Geographical Feature Disentanglement (GNFD), Geographical Exposure Element Identification (Geo-EE), and Perturbation Scale Adaptive Enhancement (PSAE)—to robustly perturb high-resolution images while preserving semantic content. The method demonstrates superior geoprivacy protection in black-box settings across Google Street View and Im2GPS3k against multiple VLMs and open-source models, with theoretical analyses supporting feature disentanglement, convergence, and privacy guarantees. Practical prompts and extensive experiments indicate GeoShield’s viability for real-world deployment and its potential as a baseline for broader privacy protections against advanced multimodal models.

Abstract

Vision-Language Models (VLMs) such as GPT-4o now demonstrate a remarkable ability to infer users' locations from public shared images, posing a substantial risk to geoprivacy. Although adversarial perturbations offer a potential defense, current methods are ill-suited for this scenario: they often perform poorly on high-resolution images and low perturbation budgets, and may introduce irrelevant semantic content. To address these limitations, we propose GeoShield, a novel adversarial framework designed for robust geoprivacy protection in real-world scenarios. GeoShield comprises three key modules: a feature disentanglement module that separates geographical and non-geographical information, an exposure element identification module that pinpoints geo-revealing regions within an image, and a scale-adaptive enhancement module that jointly optimizes perturbations at both global and local levels to ensure effectiveness across resolutions. Extensive experiments on challenging benchmarks show that GeoShield consistently surpasses prior methods in black-box settings, achieving strong privacy protection with minimal impact on visual or semantic quality. To our knowledge, this work is the first to explore adversarial perturbations for defending against geolocation inference by advanced VLMs, providing a practical and effective solution to escalating privacy concerns.

Paper Structure

This paper contains 47 sections, 20 equations, 16 figures, 5 tables.

Figures (16)

  • Figure 1: Public image sharing exposes users to geoprivacy threats, as LVLMs can accurately infer locations from visual content. GeoShield applies imperceptible perturbations to disrupt such inference and safeguard user privacy.
  • Figure 3: Overview of the GeoShield framework. GeoShield consists of three modules (GNFD, Geo-EE, and PSAE) that collaboratively suppress geographical cues while preserving semantic integrity in high-resolution images.
  • Figure 4: Visualization of different protected images.
  • Figure 5: Average distance under untargeted attacks for two perturbation budgets.
  • Figure 6: GeoShield maintains geoprivacy protection across varying levels of JPEG compression and Gaussian blur.
  • ...and 11 more figures