Table of Contents
Fetching ...

OmniField: Conditioned Neural Fields for Robust Multimodal Spatiotemporal Learning

Kevin Valencia, Thilina Balasooriya, Xihaier Luo, Shinjae Yoo, David Keetae Park

TL;DR

OmniField tackles the problem of learning from sparse, irregular, and noisy multimodal spatiotemporal data with varying modality availability. It introduces a continuity-aware conditioned neural field that leverages multimodal crosstalk (MCT), iterative cross-modal refinement (ICMR), and fleximodal fusion to fuse context across modalities before decoding, without gridding or surrogate imputation. Across ClimSim-THW and EPA-AQS benchmarks, OmniField outperforms eight strong baselines and shows robustness to heavy sensor noise, highlighting its practical viability for real-world scientific sensing. The work offers a unified framework for reconstruction, interpolation, forecasting, and cross-modal prediction under incomplete observations, with implications for climate science, air quality, and other multimodal physical systems.

Abstract

Multimodal spatiotemporal learning on real-world experimental data is constrained by two challenges: within-modality measurements are sparse, irregular, and noisy (QA/QC artifacts) but cross-modally correlated; the set of available modalities varies across space and time, shrinking the usable record unless models can adapt to arbitrary subsets at train and test time. We propose OmniField, a continuity-aware framework that learns a continuous neural field conditioned on available modalities and iteratively fuses cross-modal context. A multimodal crosstalk block architecture paired with iterative cross-modal refinement aligns signals prior to the decoder, enabling unified reconstruction, interpolation, forecasting, and cross-modal prediction without gridding or surrogate preprocessing. Extensive evaluations show that OmniField consistently outperforms eight strong multimodal spatiotemporal baselines. Under heavy simulated sensor noise, performance remains close to clean-input levels, highlighting robustness to corrupted measurements.

OmniField: Conditioned Neural Fields for Robust Multimodal Spatiotemporal Learning

TL;DR

OmniField tackles the problem of learning from sparse, irregular, and noisy multimodal spatiotemporal data with varying modality availability. It introduces a continuity-aware conditioned neural field that leverages multimodal crosstalk (MCT), iterative cross-modal refinement (ICMR), and fleximodal fusion to fuse context across modalities before decoding, without gridding or surrogate imputation. Across ClimSim-THW and EPA-AQS benchmarks, OmniField outperforms eight strong baselines and shows robustness to heavy sensor noise, highlighting its practical viability for real-world scientific sensing. The work offers a unified framework for reconstruction, interpolation, forecasting, and cross-modal prediction under incomplete observations, with implications for climate science, air quality, and other multimodal physical systems.

Abstract

Multimodal spatiotemporal learning on real-world experimental data is constrained by two challenges: within-modality measurements are sparse, irregular, and noisy (QA/QC artifacts) but cross-modally correlated; the set of available modalities varies across space and time, shrinking the usable record unless models can adapt to arbitrary subsets at train and test time. We propose OmniField, a continuity-aware framework that learns a continuous neural field conditioned on available modalities and iteratively fuses cross-modal context. A multimodal crosstalk block architecture paired with iterative cross-modal refinement aligns signals prior to the decoder, enabling unified reconstruction, interpolation, forecasting, and cross-modal prediction without gridding or surrogate preprocessing. Extensive evaluations show that OmniField consistently outperforms eight strong multimodal spatiotemporal baselines. Under heavy simulated sensor noise, performance remains close to clean-input levels, highlighting robustness to corrupted measurements.

Paper Structure

This paper contains 42 sections, 19 equations, 19 figures, 7 tables.

Figures (19)

  • Figure 1: (a) Data challenges: sparse, irregular, noisy, and dynamic measurements. (b) Modality challenges: misaligned supports, modality-specific noise, and variable modality availability. (c) Real-world example: ambient air pollution data collected from hundreds of monitors.
  • Figure 2: Overview of Our Approach. We illustrate (a) prior SCENT Park2025SCENT encoder, (b) our proposed encoder ($\textbf{E}$), (c) our multimodal crosstalk (MCT) block, and (d) our proposed OmniField architecture equipped with iterative cross-modal refinement (ICMR) strategy.
  • Figure 3: Qualitative Comparisons on ClimSim. Provided with a highly sparse yet multimodal observations, models generate full-field forecasting at $\Delta t=6$ hours. We provide comparisons against the ground truth. RMSE against the Ground Truth is shown in white boxes.
  • Figure 4: (a) Multimodal training results on OmniField, comparing four training strategies (Co-Loc=Co-Location; Interp=Interpolation; out-$m$=queried-out modality) on ClimSim. (b) EPA-AQS baselines are trained for two, four, and six modalities and compared. (c) Models trained on full six modalities in EPQ-AQS are compared. We select six representative models for the illustration. All values are RMSE in physical units.
  • Figure 5: (a)-(c) ICMR is contrasted against Mid-Fusion in an increasing amount of instance-level noise severity. (d) Our OmniField outperforms both SCENT and RainNet on an established rainfall nowcasting task.
  • ...and 14 more figures