Table of Contents
Fetching ...

Sims: An Interactive Tool for Geospatial Matching and Clustering

Akram Zaytar, Girmaw Abebe Tadesse, Caleb Robinson, Eduardo G. Bendito, Medha Devare, Meklit Chernet, Gilles Q. Hacheme, Rahul Dodhia, Juan M. Lavista Ferres

TL;DR

The paper tackles the bottleneck of slow geospatial feature discovery in large spatio-temporal domains by introducing Sims, a no-code tool that leverages Google Earth Engine to perform clustering and similarity search over user-defined regions. The approach focuses on feature exploration rather than model building, enabling rapid identification of predictive geospatial variables and regions of interest. A Rwanda case study using simulated maize yield data demonstrates how different combinations of soil, weather, and agronomy features yield distinct yield-response zones, with strong statistical separation at $K=5$ (e.g., $p<2\times10^{-16}$). Sims is open-source, reduces computational demands by offloading work to GEE, and supports downstream analysis through downloadable raster outputs, with future work aimed at improving workflow persistence, scalability, and automated feature selection, broadening its practical impact for geospatial modeling and decision support.

Abstract

Acquiring, processing, and visualizing geospatial data requires significant computing resources, especially for large spatio-temporal domains. This challenge hinders the rapid discovery of predictive features, which is essential for advancing geospatial modeling. To address this, we developed Similarity Search (Sims), a no-code web tool that allows users to perform clustering and similarity search over defined regions of interest using Google Earth Engine as a backend. Sims is designed to complement existing modeling tools by focusing on feature exploration rather than model creation. We demonstrate the utility of Sims through a case study analyzing simulated maize yield data in Rwanda, where we evaluate how different combinations of soil, weather, and agronomic features affect the clustering of yield response zones. Sims is open source and available at https://github.com/microsoft/Sims

Sims: An Interactive Tool for Geospatial Matching and Clustering

TL;DR

The paper tackles the bottleneck of slow geospatial feature discovery in large spatio-temporal domains by introducing Sims, a no-code tool that leverages Google Earth Engine to perform clustering and similarity search over user-defined regions. The approach focuses on feature exploration rather than model building, enabling rapid identification of predictive geospatial variables and regions of interest. A Rwanda case study using simulated maize yield data demonstrates how different combinations of soil, weather, and agronomy features yield distinct yield-response zones, with strong statistical separation at (e.g., ). Sims is open-source, reduces computational demands by offloading work to GEE, and supports downstream analysis through downloadable raster outputs, with future work aimed at improving workflow persistence, scalability, and automated feature selection, broadening its practical impact for geospatial modeling and decision support.

Abstract

Acquiring, processing, and visualizing geospatial data requires significant computing resources, especially for large spatio-temporal domains. This challenge hinders the rapid discovery of predictive features, which is essential for advancing geospatial modeling. To address this, we developed Similarity Search (Sims), a no-code web tool that allows users to perform clustering and similarity search over defined regions of interest using Google Earth Engine as a backend. Sims is designed to complement existing modeling tools by focusing on feature exploration rather than model creation. We demonstrate the utility of Sims through a case study analyzing simulated maize yield data in Rwanda, where we evaluate how different combinations of soil, weather, and agronomic features affect the clustering of yield response zones. Sims is open source and available at https://github.com/microsoft/Sims

Paper Structure

This paper contains 18 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Overview of Sims. The interface includes functionalities such as searching the data catalog, drawing or uploading regions of interest, creating spatio-temporal domains, loading & visualizing layers from Google Earth Engine, creating custom variable expressions (i.e., features), and downloading the resulting cluster or similarity maps.
  • Figure 2: Clustering workflow in Sims. First, define the spatial extent by uploading or drawing a region of interest. Second, set the temporal period for analysis. Third, select and load relevant variables from GEE. Finally, apply clustering to produce distinct zones based on the selected features. The resulting zones represent areas with similar geospatial characteristics.
  • Figure 3: Similarity search workflow in Sims. First, define both search and reference regions by uploading or drawing geometries. Second, set the temporal period and optionally configure land cover masking and distance metrics. Third, select and load relevant variables from GEE. Finally, generate a heat map showing distances between reference and query regions in feature space. The resulting visualization highlights areas that share similar characteristics with the reference region.
  • Figure 4: Simulated maize yield (kg/ha) distributions for each of the clusters of maize yield patterns in Rwanda. We experimented with an increasing number of cluster combinations (K) for the set of features from the agronomy domain. The number below each boxplot represent the sample size. Overlapping notches indicate non-significant differences in the yield.