Table of Contents
Fetching ...

GeoFormer: A Vision and Sequence Transformer-based Approach for Greenhouse Gas Monitoring

Madhav Khirwar, Ankur Narang

TL;DR

GeoFormer tackles the challenge of predicting surface-level $NO_2$ distributions in regions with limited ground-based monitoring by fusing a Vision Transformer for satellite imagery with an efficient time-series Transformer for temporal NO$_2$ dynamics. The approach uses cross-attention to integrate spatial and temporal cues from paired Sentinel-5P imagery and ground measurements, achieving a reported $MAE=5.65$ and strong efficiency on a 15-month, 35-station European dataset. Key contributions include a paired Sentinel-5P–NO$_2$ dataset, a compact spatio-temporal transformer architecture with ProbSparse attention, and a cross-attention fusion mechanism that outperforms Sentinel-2–based baselines while requiring significantly less compute. The work has practical implications for real-time climate monitoring and emission regulation, with potential extension to other pollutants and broader geographic coverage.

Abstract

Air pollution represents a pivotal environmental challenge globally, playing a major role in climate change via greenhouse gas emissions and negatively affecting the health of billions. However predicting the spatial and temporal patterns of pollutants remains challenging. The scarcity of ground-based monitoring facilities and the dependency of air pollution modeling on comprehensive datasets, often inaccessible for numerous areas, complicate this issue. In this work, we introduce GeoFormer, a compact model that combines a vision transformer module with a highly efficient time-series transformer module to predict surface-level nitrogen dioxide (NO2) concentrations from Sentinel-5P satellite imagery. We train the proposed model to predict surface-level NO2 measurements using a dataset we constructed with Sentinel-5P images of ground-level monitoring stations, and their corresponding NO2 concentration readings. The proposed model attains high accuracy (MAE 5.65), demonstrating the efficacy of combining vision and time-series transformer architectures to harness satellite-derived data for enhanced GHG emission insights, proving instrumental in advancing climate change monitoring and emission regulation efforts globally.

GeoFormer: A Vision and Sequence Transformer-based Approach for Greenhouse Gas Monitoring

TL;DR

GeoFormer tackles the challenge of predicting surface-level distributions in regions with limited ground-based monitoring by fusing a Vision Transformer for satellite imagery with an efficient time-series Transformer for temporal NO dynamics. The approach uses cross-attention to integrate spatial and temporal cues from paired Sentinel-5P imagery and ground measurements, achieving a reported and strong efficiency on a 15-month, 35-station European dataset. Key contributions include a paired Sentinel-5P–NO dataset, a compact spatio-temporal transformer architecture with ProbSparse attention, and a cross-attention fusion mechanism that outperforms Sentinel-2–based baselines while requiring significantly less compute. The work has practical implications for real-time climate monitoring and emission regulation, with potential extension to other pollutants and broader geographic coverage.

Abstract

Air pollution represents a pivotal environmental challenge globally, playing a major role in climate change via greenhouse gas emissions and negatively affecting the health of billions. However predicting the spatial and temporal patterns of pollutants remains challenging. The scarcity of ground-based monitoring facilities and the dependency of air pollution modeling on comprehensive datasets, often inaccessible for numerous areas, complicate this issue. In this work, we introduce GeoFormer, a compact model that combines a vision transformer module with a highly efficient time-series transformer module to predict surface-level nitrogen dioxide (NO2) concentrations from Sentinel-5P satellite imagery. We train the proposed model to predict surface-level NO2 measurements using a dataset we constructed with Sentinel-5P images of ground-level monitoring stations, and their corresponding NO2 concentration readings. The proposed model attains high accuracy (MAE 5.65), demonstrating the efficacy of combining vision and time-series transformer architectures to harness satellite-derived data for enhanced GHG emission insights, proving instrumental in advancing climate change monitoring and emission regulation efforts globally.
Paper Structure (15 sections, 3 equations, 2 figures, 1 table)

This paper contains 15 sections, 3 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: GeoFormer model architecture. Here, $m_t$ represents an NO2 prediction at timestamp $t$, and CAM represents the cross-attention module.
  • Figure 2: Example of Sentinel-5P imagery with corresponding surface-level NO2 concentrations for 6 consecutive days at the same location.