Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object Recognition
Kyle Buettner, Sina Malakouti, Xiang Lorraine Li, Adriana Kovashka
TL;DR
The paper addresses geographical domain shifts in object recognition by proposing to inject geo-diverse descriptive knowledge into prompting for CLIP-based models. It combines CLIP internal geo prompts (CountryInPrompt), external descriptive knowledge from LLMs (CountryLLM), and a geography-knowledge regularization term to produce geo-generalizable class representations. Empirical results on DollarStreet and GeoNet show that geo-aware prompting yields meaningful gains across Africa, Asia, and the Americas and can be competitive with or exceed some target-shot baselines, demonstrating practical benefits for geographically robust vision-language systems. The work highlights the importance of aligning visual-language representations with diverse regional knowledge and provides a scalable pathway to improve fairness and robustness in real-world deployments.
Abstract
Existing object recognition models have been shown to lack robustness in diverse geographical scenarios due to domain shifts in design and context. Class representations need to be adapted to more accurately reflect an object concept under these shifts. In the absence of training data from target geographies, we hypothesize that geographically diverse descriptive knowledge of categories can enhance robustness. For this purpose, we explore the feasibility of probing a large language model for geography-based object knowledge, and we examine the effects of integrating knowledge into zero-shot and learnable soft prompting with CLIP. Within this exploration, we propose geography knowledge regularization to ensure that soft prompts trained on a source set of geographies generalize to an unseen target set. Accuracy gains over prompting baselines on DollarStreet while training only on Europe data are up to +2.8/1.2/1.6 on target data from Africa/Asia/Americas, and +4.6 overall on the hardest classes. Competitive performance is shown vs. few-shot target training, and analysis is provided to direct future study of geographical robustness.
