GeoAI Reproducibility and Replicability: a computational and spatial perspective

Wenwen Li; Chia-Yu Hsu; Sizhe Wang; Peter Kedron

GeoAI Reproducibility and Replicability: a computational and spatial perspective

Wenwen Li, Chia-Yu Hsu, Sizhe Wang, Peter Kedron

TL;DR

This paper tackles reproducibility and replicability (R&R) in GeoAI, arguing that both computational and spatial factors complicate reliable inference. Using Mars crater detection with the MViTv2 vision transformer, it systematically varies training data size, random seeds, and geographic partitions to quantify how results drift across conditions, introducing a replicability map that integrates spatial autocorrelation and heterogeneity. Key findings show that while larger training sets improve $mAP50$ up to about 2,000 samples, gains plateau; fixed random seeds yield more stable results; and spatial replicability varies by location, with strong latitude-based autocorrelation but weaker longitude effects. The study underscores the need for detailed documentation, open-science practices, and spatially aware replication measures to ensure GeoAI findings generalize across heterogeneous geographies and data regimes.

Abstract

GeoAI has emerged as an exciting interdisciplinary research area that combines spatial theories and data with cutting-edge AI models to address geospatial problems in a novel, data-driven manner. While GeoAI research has flourished in the GIScience literature, its reproducibility and replicability (R&R), fundamental principles that determine the reusability, reliability, and scientific rigor of research findings, have rarely been discussed. This paper aims to provide an in-depth analysis of this topic from both computational and spatial perspectives. We first categorize the major goals for reproducing GeoAI research, namely, validation (repeatability), learning and adapting the method for solving a similar or new problem (reproducibility), and examining the generalizability of the research findings (replicability). Each of these goals requires different levels of understanding of GeoAI, as well as different methods to ensure its success. We then discuss the factors that may cause the lack of R&R in GeoAI research, with an emphasis on (1) the selection and use of training data; (2) the uncertainty that resides in the GeoAI model design, training, deployment, and inference processes; and more importantly (3) the inherent spatial heterogeneity of geospatial data and processes. We use a deep learning-based image analysis task as an example to demonstrate the results' uncertainty and spatial variance caused by different factors. The findings reiterate the importance of knowledge sharing, as well as the generation of a "replicability map" that incorporates spatial autocorrelation and spatial heterogeneity into consideration in quantifying the spatial replicability of GeoAI research.

GeoAI Reproducibility and Replicability: a computational and spatial perspective

TL;DR

up to about 2,000 samples, gains plateau; fixed random seeds yield more stable results; and spatial replicability varies by location, with strong latitude-based autocorrelation but weaker longitude effects. The study underscores the need for detailed documentation, open-science practices, and spatially aware replication measures to ensure GeoAI findings generalize across heterogeneous geographies and data regimes.

Abstract

Paper Structure (15 sections, 9 figures, 5 tables)

This paper contains 15 sections, 9 figures, 5 tables.

Introduction
Computational and spatial challenges toward achieving R&R in GeoAI research
Computational challenges in achieving R&R in GeoAI
Spatial challenges in achieving R&R in GeoAI
Data and Method
Dataset
Tasks and model
Experimental design
Results Analysis
Impact of sample size on model stability and performance
Random seed effects on model consistency
Locational variance on GeoAI model performance based on random sampling
Locational variance in GeoAI model performance across varying latitude
Locational variance on GeoAI model performance across varying longitude
Conclusion

Figures (9)

Figure 1: The R&R spectrum in GeoAI research.
Figure 2: Validation accuracy of the GeoAI model across different training epochs with varying sample sizes. "Max: (Epoch, mAP50)" in the legend indicates the epoch at which the model achieves the highest predictive accuracy, measured by the standard metric mAP50 (mAP: mean average precision, 50 means a threshold setting in the measure). The corresponding maximum values are also highlighted on the performance curves as dots.
Figure 3: The highest validation and testing accuracy (measured by mAP50) of GeoAI models trained with different training dataset sizes.
Figure 4: Initialization of random seeds in Python to measure model reproducibility.
Figure 5: Variance of the model performance with fixed (nonrandom) and random seed settings for both validation and testing datasets. mean and std refer to the average and standard deviation of prediction accuracy values over 20 model runs.
...and 4 more figures

GeoAI Reproducibility and Replicability: a computational and spatial perspective

TL;DR

Abstract

GeoAI Reproducibility and Replicability: a computational and spatial perspective

Authors

TL;DR

Abstract

Table of Contents

Figures (9)