Table of Contents
Fetching ...

GeoNDC: A Queryable Neural Data Cube for Planetary-Scale Earth Observation

Jianbo Qi, Mengyao Li, Baogui Jiang, Yidan Chen, Xihan Mu, Qiao Wang

Abstract

Satellite Earth observation has accumulated massive spatiotemporal archives essential for monitoring environmental change, yet these remain organized as discrete raster files, making them costly to store, transmit, and query. We present GeoNDC, a queryable neural data cube that encodes planetary-scale Earth observation data as a continuous spatiotemporal implicit neural field, enabling on-demand queries and continuous-time reconstruction without full decompression. Experiments on a 20-year global MODIS MCD43A4 reflectance record ($8016 \times 4008$ pixels, 7 bands, 915 temporal frames) show that the learned representation supports direct spatiotemporal queries on consumer hardware. On Sentinel-2 imagery (10 m), continuous temporal parameterization recovers cloud-free dynamics with high fidelity ($R^2 > 0.85$) under simulated 2-km cloud occlusion. On HiGLASS biophysical products (LAI and FPAR), GeoNDC attains near-perfect accuracy ($R^2 > 0.98$). The representation compresses the 20-year MODIS archive to 0.44\,GB -- approximately 95:1 relative to an optimized Int16 baseline -- with high spectral fidelity (mean $R^2 > 0.98$, mean RMSE $= 0.021$). These results suggest GeoNDC offers a unified AI-native representation for planetary-scale Earth observation, complementing raw archives with a compact, analysis-ready data layer integrating query, reconstruction, and compression in a single framework.

GeoNDC: A Queryable Neural Data Cube for Planetary-Scale Earth Observation

Abstract

Satellite Earth observation has accumulated massive spatiotemporal archives essential for monitoring environmental change, yet these remain organized as discrete raster files, making them costly to store, transmit, and query. We present GeoNDC, a queryable neural data cube that encodes planetary-scale Earth observation data as a continuous spatiotemporal implicit neural field, enabling on-demand queries and continuous-time reconstruction without full decompression. Experiments on a 20-year global MODIS MCD43A4 reflectance record ( pixels, 7 bands, 915 temporal frames) show that the learned representation supports direct spatiotemporal queries on consumer hardware. On Sentinel-2 imagery (10 m), continuous temporal parameterization recovers cloud-free dynamics with high fidelity () under simulated 2-km cloud occlusion. On HiGLASS biophysical products (LAI and FPAR), GeoNDC attains near-perfect accuracy (). The representation compresses the 20-year MODIS archive to 0.44\,GB -- approximately 95:1 relative to an optimized Int16 baseline -- with high spectral fidelity (mean , mean RMSE ). These results suggest GeoNDC offers a unified AI-native representation for planetary-scale Earth observation, complementing raw archives with a compact, analysis-ready data layer integrating query, reconstruction, and compression in a single framework.

Paper Structure

This paper contains 23 sections, 8 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Overview of the GeoNDC representation paradigm. Conventional Earth observation archives are typically organized as discrete, file-based raster tiles or chunked arrays across space and time. GeoNDC reformulates such archives as a queryable spatiotemporal neural data cube, enabling unified point and region queries, arbitrary-time reconstruction, and web-based visualization from a single executable representation.
  • Figure 2: Architecture of the spatiotemporal embedding. A query point $(x,y,t)$ from the original spatiotemporal data cube is processed through a decoupled dual-branch structure designed to address the intrinsic spatiotemporal anisotropy of Earth observations. To preserve high-frequency spatial boundaries, the spatial coordinates $(x,y)$ are queried directly in a static high-resolution 2D HashGrid (bottom). To capture smoother regional spatiotemporal dynamics while reducing temporal striping artifacts, a spatial scaling factor $s$ is applied to generate scaled coordinates $(s x, s y, t)$, which are then fed into a coarse 3D HashGrid (top). The hierarchical descriptors extracted by interpolation, $\mathbf{f}_{xy}$ and $\mathbf{f}_{T}$, are concatenated to form the final spatiotemporal mixed embedding vector.
  • Figure 3: The GeoNDC architecture. The framework maps coordinates from the original spatiotemporal data cube into a high-dimensional feature space through spatiotemporal geometric embedding. The resulting mixed embedding vector $\mathbf{F}$ is decoded by an MLP to predict the surface state $\widehat{\mathbf{v}}$. Model parameters are optimized through backpropagation by comparing the prediction $\widehat{\mathbf{v}}$ with the ground-truth value $\mathbf{v}$. To preserve localized high-frequency details, residuals $\mathbf{r} = \mathbf{v} - \widehat{\mathbf{v}}$ whose magnitude exceeds a threshold $\tau$ are quantized and stored in a sparse residual package using entropy coding, forming an optional correction layer on top of the base neural representation.
  • Figure 4: The GeoNDC unified storage protocol and access pipeline. The GeoNDC Encoder compresses raw spatiotemporal data cubes into a compact .gndc file composed of three components: (1) a Global Geospatial Header that preserves geospatial metadata required for interoperability, including the coordinate reference system (CRS) and temporal indexing information; (2) a Neural Payload containing the quantized parameters of the learned neural field; and (3) an optional Physical Correction Layer that stores validity masks and sparse residuals. The GeoNDC Reader supports direct random access, allowing arbitrary points or regions in space and time, $(x,y,t)$, to be queried without full-volume decompression.
  • Figure 5: High-resolution surface reconstruction under incomplete observations. (a) Mask-and-Restore experiment: Visual comparison between Sentinel-2 snapshots with simulated gaps of varying scales (small, medium, and large; left) and the corresponding GeoNDC reconstructions (right). (b) Reconstruction fidelity in valid regions: Density scatter plots for the Red (B4) and NIR (B8) bands, comparing GeoNDC reconstructions with the original observations in non-masked regions. (c) Recovery accuracy across gap sizes: Quantitative comparison between GeoNDC reconstructions and the ground-truth observations within masked regions for different gap scales. (d) Interpolation baseline comparison: Reconstruction performance of a traditional linear interpolation method over the same masked regions and spectral bands.
  • ...and 3 more figures