Self-Supervised Representation Learning for Geospatial Objects: A Survey
Yile Chen, Weiming Huang, Kaiqi Zhao, Yue Jiang, Gao Cong
TL;DR
This survey surveys self-supervised learning techniques tailored to geospatial objects, categorized by three geometric types (Points, Polylines, Polygons) and organized around predictive and contrastive SSL paradigms. It highlights encoder choices (GNNs, sequential models, pre-trained models) and provides detailed taxonomies for intrinsic attributes, context information, and applications across POIs, road networks, trajectories, and urban regions. The paper emphasizes multi-type learning, the emergence of geospatial foundation models, and task-specific SSL approaches, while outlining challenges such as benchmark scarcity, multimodal fusion, and privacy concerns. It concludes with future directions toward standardized benchmarks, enhanced multi-modality, and scalable foundation-model frameworks for GeoAI.
Abstract
The proliferation of various data sources in urban and territorial environments has significantly facilitated the development of geospatial artificial intelligence (GeoAI) across a wide range of geospatial applications. However, geospatial data, which is inherently linked to geospatial objects, often exhibits data heterogeneity that necessitates specialized fusion and representation strategies while simultaneously being inherently sparse in labels for downstream tasks. Consequently, there is a growing demand for techniques that can effectively leverage geospatial data without heavy reliance on task-specific labels and model designs. This need aligns with the principles of self-supervised learning (SSL), which has garnered increasing attention for its ability to learn effective and generalizable representations directly from data without extensive labeled supervision. This paper presents a comprehensive and up-to-date survey of SSL techniques specifically applied to or developed for geospatial objects in three primary vector geometric types: Point, Polyline, and Polygon. We systematically categorize various SSL techniques into predictive and contrastive methods, and analyze their adaptation to different data types for representation learning across various downstream tasks. Furthermore, we examine the emerging trends in SSL for geospatial objects, particularly the gradual advancements towards geospatial foundation models. Finally, we discuss key challenges in current research and outline promising directions for future investigation. By offering a structured analysis of existing studies, this paper aims to inspire continued progress in integrating SSL with geospatial objects, and the development of geospatial foundation models in a longer term.
