Table of Contents
Fetching ...

GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents

Xinyu Zhang, Yixin Wu, Boyang Zhang, Chenhao Lin, Chao Shen, Michael Backes, Yang Zhang

TL;DR

The paper investigates privacy risks from geolocating common social media images using an autonomous agent, GEO-Detective, which combines LVLM reasoning with external tools through a four-stage pipeline (visual analysis, strategy execution, results synthesis, iterative refinement). It introduces a difficulty-based strategy selection mechanism and modules like visual feature segmentation and visual reverse search, achieving higher accuracy than strong LVLM baselines, especially on challenging images, and significantly reducing unknown predictions when external clues are available. It provides extensive ablations, assesses generalizability with multiple models, and evaluates defense strategies, finding watermarking to be the most effective at suppressing geolocation, while highlighting the need for robust privacy safeguards. Overall, the work demonstrates the amplified privacy risks posed by agentic geolocation and offers a foundation for developing and evaluating defenses against such tooling.

Abstract

Images shared on social media often expose geographic cues. While early geolocation methods required expert effort and lacked generalization, the rise of Large Vision Language Models (LVLMs) now enables accurate geolocation even for ordinary users. However, existing approaches are not optimized for this task. To explore the full potential and associated privacy risks, we present Geo-Detective, an agent that mimics human reasoning and tool use for image geolocation inference. It follows a procedure with four steps that adaptively selects strategies based on image difficulty and is equipped with specialized tools such as visual reverse search, which emulates how humans gather external geographic clues. Experimental results show that GEO-Detective outperforms baseline large vision language models (LVLMs) overall, particularly on images lacking visible geographic features. In country level geolocation tasks, it achieves an improvement of over 11.1% compared to baseline LLMs, and even at finer grained levels, it still provides around a 5.2% performance gain. Meanwhile, when equipped with external clues, GEO-Detective becomes more likely to produce accurate predictions, reducing the "unknown" prediction rate by more than 50.6%. We further explore multiple defense strategies and find that Geo-Detective exhibits stronger robustness, highlighting the need for more effective privacy safeguards.

GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents

TL;DR

The paper investigates privacy risks from geolocating common social media images using an autonomous agent, GEO-Detective, which combines LVLM reasoning with external tools through a four-stage pipeline (visual analysis, strategy execution, results synthesis, iterative refinement). It introduces a difficulty-based strategy selection mechanism and modules like visual feature segmentation and visual reverse search, achieving higher accuracy than strong LVLM baselines, especially on challenging images, and significantly reducing unknown predictions when external clues are available. It provides extensive ablations, assesses generalizability with multiple models, and evaluates defense strategies, finding watermarking to be the most effective at suppressing geolocation, while highlighting the need for robust privacy safeguards. Overall, the work demonstrates the amplified privacy risks posed by agentic geolocation and offers a foundation for developing and evaluating defenses against such tooling.

Abstract

Images shared on social media often expose geographic cues. While early geolocation methods required expert effort and lacked generalization, the rise of Large Vision Language Models (LVLMs) now enables accurate geolocation even for ordinary users. However, existing approaches are not optimized for this task. To explore the full potential and associated privacy risks, we present Geo-Detective, an agent that mimics human reasoning and tool use for image geolocation inference. It follows a procedure with four steps that adaptively selects strategies based on image difficulty and is equipped with specialized tools such as visual reverse search, which emulates how humans gather external geographic clues. Experimental results show that GEO-Detective outperforms baseline large vision language models (LVLMs) overall, particularly on images lacking visible geographic features. In country level geolocation tasks, it achieves an improvement of over 11.1% compared to baseline LLMs, and even at finer grained levels, it still provides around a 5.2% performance gain. Meanwhile, when equipped with external clues, GEO-Detective becomes more likely to produce accurate predictions, reducing the "unknown" prediction rate by more than 50.6%. We further explore multiple defense strategies and find that Geo-Detective exhibits stronger robustness, highlighting the need for more effective privacy safeguards.

Paper Structure

This paper contains 26 sections, 1 equation, 7 figures, 12 tables.

Figures (7)

  • Figure 1: System design of the GEO-Detective. The agent takes images shared by users on online platforms as input and processes them through four stages, including visual feature analysis, strategy execution, result synthesis, and iterative refinement. During this process, users' location information may be inferred, leading to potential privacy risks.
  • Figure 2: Prompt optimization and GeoCLIP similarity heatmaps. The left column shows the initial prompts, the middle column presents the optimized prompts, and the right column compares the heatmaps generated from the ground-truth prompts and the optimized prompts. The image IDs from the MP16 dataset are displayed at the bottom. In the optimized prompts, elements highlighted in green correspond to the five predefined categories in Appendix \ref{['tab:geo_elements']} (architectural, infrastructure, environmental, urban planning, signage), while elements in red denote additional geographic cues introduced by the LLM during the optimization process.
  • Figure 3: Accuracy comparison of the baseline LVLM and GEO-Detective across difficulty levels.
  • Figure 4: Comparison between our LLM-based geographic feature segmentation (left) and YOLOv5 (right) on an example from the MP16 dataset (image ID: 1c_8c_311262130).
  • Figure 5: Result of simulating human web search operations. (Since automated operations on Google often trigger human verification, Yandex was used for testing instead.)
  • ...and 2 more figures