Table of Contents
Fetching ...

SCENT: Robust Spatiotemporal Learning for Continuous Scientific Data via Scalable Conditioned Neural Fields

David Keetae Park, Xihaier Luo, Guang Zhao, Seungjun Lee, Miruna Oprescu, Shinjae Yoo

TL;DR

SCENT introduces a scalable, continuity-informed spatiotemporal learning framework that unifies interpolation, reconstruction, and forecasting for continuous scientific data. It uses a transformer-based encoder-processor-decoder with learnable queries, a Time-Targeted Spatial Encoder, and a Temporal Warp Processor to capture multi-scale dependencies, aided by sparse attention for efficiency. Across Navier-Stokes benchmarks, simulated large-scale datasets, and real AirDelhi PM2.5 data, SCENT achieves state-of-the-art or competitive performance with strong robustness to noise, missing data, and dynamic sensor patterns. The work demonstrates the viability of time-continuous representations from irregular data, with broad implications for geophysics, climate science, and environmental monitoring.

Abstract

Spatiotemporal learning is challenging due to the intricate interplay between spatial and temporal dependencies, the high dimensionality of the data, and scalability constraints. These challenges are further amplified in scientific domains, where data is often irregularly distributed (e.g., missing values from sensor failures) and high-volume (e.g., high-fidelity simulations), posing additional computational and modeling difficulties. In this paper, we present SCENT, a novel framework for scalable and continuity-informed spatiotemporal representation learning. SCENT unifies interpolation, reconstruction, and forecasting within a single architecture. Built on a transformer-based encoder-processor-decoder backbone, SCENT introduces learnable queries to enhance generalization and a query-wise cross-attention mechanism to effectively capture multi-scale dependencies. To ensure scalability in both data size and model complexity, we incorporate a sparse attention mechanism, enabling flexible output representations and efficient evaluation at arbitrary resolutions. We validate SCENT through extensive simulations and real-world experiments, demonstrating state-of-the-art performance across multiple challenging tasks while achieving superior scalability.

SCENT: Robust Spatiotemporal Learning for Continuous Scientific Data via Scalable Conditioned Neural Fields

TL;DR

SCENT introduces a scalable, continuity-informed spatiotemporal learning framework that unifies interpolation, reconstruction, and forecasting for continuous scientific data. It uses a transformer-based encoder-processor-decoder with learnable queries, a Time-Targeted Spatial Encoder, and a Temporal Warp Processor to capture multi-scale dependencies, aided by sparse attention for efficiency. Across Navier-Stokes benchmarks, simulated large-scale datasets, and real AirDelhi PM2.5 data, SCENT achieves state-of-the-art or competitive performance with strong robustness to noise, missing data, and dynamic sensor patterns. The work demonstrates the viability of time-continuous representations from irregular data, with broad implications for geophysics, climate science, and environmental monitoring.

Abstract

Spatiotemporal learning is challenging due to the intricate interplay between spatial and temporal dependencies, the high dimensionality of the data, and scalability constraints. These challenges are further amplified in scientific domains, where data is often irregularly distributed (e.g., missing values from sensor failures) and high-volume (e.g., high-fidelity simulations), posing additional computational and modeling difficulties. In this paper, we present SCENT, a novel framework for scalable and continuity-informed spatiotemporal representation learning. SCENT unifies interpolation, reconstruction, and forecasting within a single architecture. Built on a transformer-based encoder-processor-decoder backbone, SCENT introduces learnable queries to enhance generalization and a query-wise cross-attention mechanism to effectively capture multi-scale dependencies. To ensure scalability in both data size and model complexity, we incorporate a sparse attention mechanism, enabling flexible output representations and efficient evaluation at arbitrary resolutions. We validate SCENT through extensive simulations and real-world experiments, demonstrating state-of-the-art performance across multiple challenging tasks while achieving superior scalability.

Paper Structure

This paper contains 46 sections, 20 equations, 9 figures, 7 tables, 1 algorithm.

Figures (9)

  • Figure 1: Challenging data scenarios motivating this study. (a) Learning continuous ground truth signal (GT) given noisy, malfunctioning, sparse, or moving sensors is a daunting challenge. However, these challenges are common in scientific data, including AirDelhi airdelhi data which measure the particulate matter levels (PM2.5) from moving vehicles.
  • Figure 2: SCENT overview. (a) Detailed architecture of SCENT is illustrated. Our unique contributions are drawn with red boxes. Specifically, we introduce time coordinates to both encoder and decoder for learning continuous time representations. Also, we introduce Context Embedding Network and Calibration Network for improved spatial encoding and decoding, respectively. $n_\text{enc}$, $n_\text{proc}$, and $n_\text{dec}$ denote the number of layers in encoder, processor, and decoder respectively. (b) At the inference stage, a single SCENT model is capable to jointly perform reconstruction, spatiotemporal interpolation, and forecasting.
  • Figure 3: Warp-unrolling forecasting (WUF). (a) Conventional forecasting includes single-step unrolling which accumulates error. A dimming blue color is used to represent the increasing error. (b) WUF helps mitigate error accumulations caused by extensive unrolling steps.
  • Figure 4: Scalability evaluations. (a) Texts next to each circle are the number of model parameters, and circle size is also proportional to it. Red dotted lines are scalability trends derived with exponential functions for comparisons. (b) Colors indicate training runs with identical model sizes. Texts next to each circle are the number of training instances, and the circle size is also proportional to it.
  • Figure 5: Qualitative comparisons for forecasting. (a) Using partial input from the S5 dataset, each pretrained model is assessed on the full mesh (GT), performing both joint forecasting and spatial interpolation. (b) Models forecast PM2.5 on the spatiotemporal locations where measurements are available (GT). 'delta' (upper rows) represents the absolute difference between each prediction and GT.
  • ...and 4 more figures