GeoAI Reproducibility and Replicability: a computational and spatial perspective
Wenwen Li, Chia-Yu Hsu, Sizhe Wang, Peter Kedron
TL;DR
This paper tackles reproducibility and replicability (R&R) in GeoAI, arguing that both computational and spatial factors complicate reliable inference. Using Mars crater detection with the MViTv2 vision transformer, it systematically varies training data size, random seeds, and geographic partitions to quantify how results drift across conditions, introducing a replicability map that integrates spatial autocorrelation and heterogeneity. Key findings show that while larger training sets improve $mAP50$ up to about 2,000 samples, gains plateau; fixed random seeds yield more stable results; and spatial replicability varies by location, with strong latitude-based autocorrelation but weaker longitude effects. The study underscores the need for detailed documentation, open-science practices, and spatially aware replication measures to ensure GeoAI findings generalize across heterogeneous geographies and data regimes.
Abstract
GeoAI has emerged as an exciting interdisciplinary research area that combines spatial theories and data with cutting-edge AI models to address geospatial problems in a novel, data-driven manner. While GeoAI research has flourished in the GIScience literature, its reproducibility and replicability (R&R), fundamental principles that determine the reusability, reliability, and scientific rigor of research findings, have rarely been discussed. This paper aims to provide an in-depth analysis of this topic from both computational and spatial perspectives. We first categorize the major goals for reproducing GeoAI research, namely, validation (repeatability), learning and adapting the method for solving a similar or new problem (reproducibility), and examining the generalizability of the research findings (replicability). Each of these goals requires different levels of understanding of GeoAI, as well as different methods to ensure its success. We then discuss the factors that may cause the lack of R&R in GeoAI research, with an emphasis on (1) the selection and use of training data; (2) the uncertainty that resides in the GeoAI model design, training, deployment, and inference processes; and more importantly (3) the inherent spatial heterogeneity of geospatial data and processes. We use a deep learning-based image analysis task as an example to demonstrate the results' uncertainty and spatial variance caused by different factors. The findings reiterate the importance of knowledge sharing, as well as the generation of a "replicability map" that incorporates spatial autocorrelation and spatial heterogeneity into consideration in quantifying the spatial replicability of GeoAI research.
