Leveraging Contrastive Learning for Few-shot Geolocation of Social Posts
Menglin Li, Kwan Hui Lim
TL;DR
This paper tackles the problem of geolocating social posts when labeled data are scarce and location distributions are highly imbalanced. It introduces ContrastGeo, a few-shot framework that combines Tweet-Location Contrastive Learning (TLC) to align tweet and location representations with a Tweet-Location Matching (TLM) objective that uses online hard negative mining, all within a flexible fusion module to form joint representations. The approach is evaluated on three public datasets and shows consistent, substantial improvements over state-of-the-art baselines in 1–16 shot settings, aided by fine-tuning a pre-trained language model and careful design choices such as pooling and prompts. The work highlights the effectiveness of contrastive learning for cross-entity alignment in geolocation and suggests broader applicability to other social analysis tasks in data-scarce regimes.
Abstract
Social geolocation is an important problem of predicting the originating locations of social media posts. However, this task is challenging due to the need for a substantial volume of training data, alongside well-annotated labels. These issues are further exacerbated by new or less popular locations with insufficient labels, further leading to an imbalanced dataset. In this paper, we propose \textbf{ContrastGeo}, a \textbf{Contrast}ive learning enhanced framework for few-shot social \textbf{Geo}location. Specifically, a Tweet-Location Contrastive learning objective is introduced to align representations of tweets and locations within tweet-location pairs. To capture the correlations between tweets and locations, a Tweet-Location Matching objective is further adopted into the framework and refined via an online hard negative mining approach. We also develop three fusion strategies with various fusion encoders to better generate joint representations of tweets and locations. Comprehensive experiments on three social media datasets highlight ContrastGeo's superior performance over several state-of-the-art baselines in few-shot social geolocation.
