Table of Contents
Fetching ...

Spatial Data Science Languages: commonalities and needs

Edzer Pebesma, Martin Fleischmann, Josiah Parry, Jakub Nowosad, Anita Graser, Dewey Dunnington, Maarten Pronk, Rafael Schouten, Robin Lovelace, Marius Appel, Lorena Abad

TL;DR

This paper investigates how spatial data science operates across R, Python, and Julia, focusing on coordinating workflows for spatial and spatio-temporal data. It synthesizes insights from the Spatial Data Science Languages workshops, identifying challenges in file formats, geodetic handling, data cubes, and cross-language development, and proposes open standards, field-domain alignment, and community practices. The authors compare GIS and modelling conventions, discuss trajectories and moving features, and describe cross-language infrastructure as an emerging need. The findings aim to improve interoperability, reproducibility, and collaboration across language ecosystems in spatial data science.

Abstract

Recent workshops brought together several developers, educators and users of software packages extending popular languages for spatial data handling, with a primary focus on R, Python and Julia. Common challenges discussed included handling of spatial or spatio-temporal support, geodetic coordinates, in-memory vector data formats, data cubes, inter-package dependencies, packaging upstream libraries, differences in habits or conventions between the GIS and physical modelling communities, and statistical models. The following set of insights have been formulated: (i) considering software problems across data science language silos helps to understand and standardise analysis approaches, also outside the domain of formal standardisation bodies; (ii) whether attribute variables have block or point support, and whether they are spatially intensive or extensive has consequences for permitted operations, and hence for software implementing those; (iii) handling geometries on the sphere rather than on the flat plane requires modifications to the logic of {\em simple features}, (iv) managing communities and fostering diversity is a necessary, on-going effort, and (v) tools for cross-language development need more attention and support.

Spatial Data Science Languages: commonalities and needs

TL;DR

This paper investigates how spatial data science operates across R, Python, and Julia, focusing on coordinating workflows for spatial and spatio-temporal data. It synthesizes insights from the Spatial Data Science Languages workshops, identifying challenges in file formats, geodetic handling, data cubes, and cross-language development, and proposes open standards, field-domain alignment, and community practices. The authors compare GIS and modelling conventions, discuss trajectories and moving features, and describe cross-language infrastructure as an emerging need. The findings aim to improve interoperability, reproducibility, and collaboration across language ecosystems in spatial data science.

Abstract

Recent workshops brought together several developers, educators and users of software packages extending popular languages for spatial data handling, with a primary focus on R, Python and Julia. Common challenges discussed included handling of spatial or spatio-temporal support, geodetic coordinates, in-memory vector data formats, data cubes, inter-package dependencies, packaging upstream libraries, differences in habits or conventions between the GIS and physical modelling communities, and statistical models. The following set of insights have been formulated: (i) considering software problems across data science language silos helps to understand and standardise analysis approaches, also outside the domain of formal standardisation bodies; (ii) whether attribute variables have block or point support, and whether they are spatially intensive or extensive has consequences for permitted operations, and hence for software implementing those; (iii) handling geometries on the sphere rather than on the flat plane requires modifications to the logic of {\em simple features}, (iv) managing communities and fostering diversity is a necessary, on-going effort, and (v) tools for cross-language development need more attention and support.

Paper Structure

This paper contains 28 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Dependency of R and Python spatial packages on other libraries and external system requirements. Green long-dashed arrows indicate optional dependencies, purple dotted arrows indicate planned optional dependencies, while the orange dashed lines indicate optional dependency through the Xarray's extension mechanism.
  • Figure 2: Geospatial stack in Julia. Like other languages, the well known C++ libraries are wrapped (blue), resulting in the core packages of GeometryOps.jl, GeoDataFrames.jl, and Rasters.jl (red). A number of packages implement native file formats (orange). Note that while the stack depends on other non-spatial Julia packages (purple), much of the interaction and indeed dependencies are replaced by the implementation of interfaces using traits (green). Only where native Julia types are not adequate, we created spatial versions of them (yellow).
  • Figure 3: Support of a variable: retrieving the values of the black polygons for the red point or the red square needs knowledge whether values (grey) are associated with each point in a black polygon (point support) or whether they summarise properties of all points in a polygon (block support). Vice versa, going from red to black geometries involves the same problem.