Table of Contents
Fetching ...

OBSR: Open Benchmark for Spatial Representations

Julia Moska, Oleksii Furman, Kacper Kozaczko, Szymon Leszkiewicz, Jakub Polczyk, Piotr Gramacki, Piotr Szymański

TL;DR

OBSR introduces a modular, modality-agnostic geospatial benchmark that unifies seven diverse urban datasets to evaluate spatial representations across region- and trajectory-based tasks. It combines multi-resolution evaluation using the H3 hex grid, transparent preprocessing, and standardized train-test splits, with simple, reproducible baselines to anchor comparisons. The framework includes tailored evaluation metrics (including $R^2$ for bounded crime intensity, and diverse trajectory distances) and a public library (SRAI) integration to ensure reproducibility and extensibility. By exposing multi-task benchmarks and open data/code, OBSR aims to accelerate robust GeoAI development and enable fair, cross-method comparisons in urban spatial intelligence.

Abstract

GeoAI is evolving rapidly, fueled by diverse geospatial datasets like traffic patterns, environmental data, and crowdsourced OpenStreetMap (OSM) information. While sophisticated AI models are being developed, existing benchmarks are often concentrated on single tasks and restricted to a single modality. As such, progress in GeoAI is limited by the lack of a standardized, multi-task, modality-agnostic benchmark for their systematic evaluation. This paper introduces a novel benchmark designed to assess the performance, accuracy, and efficiency of geospatial embedders. Our benchmark is modality-agnostic and comprises 7 distinct datasets from diverse cities across three continents, ensuring generalizability and mitigating demographic biases. It allows for the evaluation of GeoAI embedders on various phenomena that exhibit underlying geographic processes. Furthermore, we establish a simple and intuitive task-oriented model baselines, providing a crucial reference point for comparing more complex solutions.

OBSR: Open Benchmark for Spatial Representations

TL;DR

OBSR introduces a modular, modality-agnostic geospatial benchmark that unifies seven diverse urban datasets to evaluate spatial representations across region- and trajectory-based tasks. It combines multi-resolution evaluation using the H3 hex grid, transparent preprocessing, and standardized train-test splits, with simple, reproducible baselines to anchor comparisons. The framework includes tailored evaluation metrics (including for bounded crime intensity, and diverse trajectory distances) and a public library (SRAI) integration to ensure reproducibility and extensibility. By exposing multi-task benchmarks and open data/code, OBSR aims to accelerate robust GeoAI development and enable fair, cross-method comparisons in urban spatial intelligence.

Abstract

GeoAI is evolving rapidly, fueled by diverse geospatial datasets like traffic patterns, environmental data, and crowdsourced OpenStreetMap (OSM) information. While sophisticated AI models are being developed, existing benchmarks are often concentrated on single tasks and restricted to a single modality. As such, progress in GeoAI is limited by the lack of a standardized, multi-task, modality-agnostic benchmark for their systematic evaluation. This paper introduces a novel benchmark designed to assess the performance, accuracy, and efficiency of geospatial embedders. Our benchmark is modality-agnostic and comprises 7 distinct datasets from diverse cities across three continents, ensuring generalizability and mitigating demographic biases. It allows for the evaluation of GeoAI embedders on various phenomena that exhibit underlying geographic processes. Furthermore, we establish a simple and intuitive task-oriented model baselines, providing a crucial reference point for comparing more complex solutions.

Paper Structure

This paper contains 40 sections, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Mean prices per haxagon of houses in King County.
  • Figure 2: Spatially consistent class-encoding scheme in which each hexagon is encoded relative to its predecessor.
  • Figure 3: Target feature distribution remains stable for different resolutions in the HPP task.
  • Figure 4: Target feature distribution changes with different resolutions in the CAP task.
  • Figure 5: Splitting algorithm ensures no spatial overlap between train and test sets.
  • ...and 6 more figures