FewUser: Few-Shot Social User Geolocation via Contrastive Learning
Menglin Li, Kwan Hui Lim
TL;DR
The paper tackles the data scarcity challenge in social user geolocation by proposing FewUser, a contrastive-learning framework that aligns user and location representations with minimal training data. It introduces two metadata-rich datasets, TwiU and FliU, and integrates a user representation module with a geographical prompting module to bridge PLM knowledge and geographic data. Training uses dual objectives (contrastive and matching losses) with hard negative mining, achieving substantial zero-shot and few-shot gains (e.g., 16.75%–27.73% zero-shot accuracy and 26.95%/41.62% improvements in 1-shot settings on TwiU/FliU). Extensive ablations reveal the importance of input design, prompt types (hard/soft/semi-soft), and backbone choices, providing practical guidance for future work in scalable, prompt-enhanced geolocation under limited supervision.
Abstract
To address the challenges of scarcity in geotagged data for social user geolocation, we propose FewUser, a novel framework for Few-shot social User geolocation. We incorporate a contrastive learning strategy between users and locations to improve geolocation performance with no or limited training data. FewUser features a user representation module that harnesses a pre-trained language model (PLM) and a user encoder to process and fuse diverse social media inputs effectively. To bridge the gap between PLM's knowledge and geographical data, we introduce a geographical prompting module with hard, soft, and semi-soft prompts, to enhance the encoding of location information. Contrastive learning is implemented through a contrastive loss and a matching loss, complemented by a hard negative mining strategy to refine the learning process. We construct two datasets TwiU and FliU, containing richer metadata than existing benchmarks, to evaluate FewUser and the extensive experiments demonstrate that FewUser significantly outperforms state-of-the-art methods in both zero-shot and various few-shot settings, achieving absolute improvements of 26.95\% and \textbf{41.62\%} on TwiU and FliU, respectively, with only one training sample per class. We further conduct a comprehensive analysis to investigate the impact of user representation on geolocation performance and the effectiveness of FewUser's components, offering valuable insights for future research in this area.
