GeoFormer: A Vision and Sequence Transformer-based Approach for Greenhouse Gas Monitoring

Madhav Khirwar; Ankur Narang

GeoFormer: A Vision and Sequence Transformer-based Approach for Greenhouse Gas Monitoring

Madhav Khirwar, Ankur Narang

TL;DR

GeoFormer tackles the challenge of predicting surface-level $NO_2$ distributions in regions with limited ground-based monitoring by fusing a Vision Transformer for satellite imagery with an efficient time-series Transformer for temporal NO$_2$ dynamics. The approach uses cross-attention to integrate spatial and temporal cues from paired Sentinel-5P imagery and ground measurements, achieving a reported $MAE=5.65$ and strong efficiency on a 15-month, 35-station European dataset. Key contributions include a paired Sentinel-5P–NO$_2$ dataset, a compact spatio-temporal transformer architecture with ProbSparse attention, and a cross-attention fusion mechanism that outperforms Sentinel-2–based baselines while requiring significantly less compute. The work has practical implications for real-time climate monitoring and emission regulation, with potential extension to other pollutants and broader geographic coverage.

Abstract

Air pollution represents a pivotal environmental challenge globally, playing a major role in climate change via greenhouse gas emissions and negatively affecting the health of billions. However predicting the spatial and temporal patterns of pollutants remains challenging. The scarcity of ground-based monitoring facilities and the dependency of air pollution modeling on comprehensive datasets, often inaccessible for numerous areas, complicate this issue. In this work, we introduce GeoFormer, a compact model that combines a vision transformer module with a highly efficient time-series transformer module to predict surface-level nitrogen dioxide (NO2) concentrations from Sentinel-5P satellite imagery. We train the proposed model to predict surface-level NO2 measurements using a dataset we constructed with Sentinel-5P images of ground-level monitoring stations, and their corresponding NO2 concentration readings. The proposed model attains high accuracy (MAE 5.65), demonstrating the efficacy of combining vision and time-series transformer architectures to harness satellite-derived data for enhanced GHG emission insights, proving instrumental in advancing climate change monitoring and emission regulation efforts globally.

GeoFormer: A Vision and Sequence Transformer-based Approach for Greenhouse Gas Monitoring

TL;DR

GeoFormer tackles the challenge of predicting surface-level

distributions in regions with limited ground-based monitoring by fusing a Vision Transformer for satellite imagery with an efficient time-series Transformer for temporal NO

dynamics. The approach uses cross-attention to integrate spatial and temporal cues from paired Sentinel-5P imagery and ground measurements, achieving a reported

and strong efficiency on a 15-month, 35-station European dataset. Key contributions include a paired Sentinel-5P–NO

dataset, a compact spatio-temporal transformer architecture with ProbSparse attention, and a cross-attention fusion mechanism that outperforms Sentinel-2–based baselines while requiring significantly less compute. The work has practical implications for real-time climate monitoring and emission regulation, with potential extension to other pollutants and broader geographic coverage.

Abstract

Paper Structure (15 sections, 3 equations, 2 figures, 1 table)

This paper contains 15 sections, 3 equations, 2 figures, 1 table.

Introduction
Related Work
Transformer Models
Deep Learning for Greenhouse Gas Emissions
Methodology
Vision Transformer Module
Efficient Time-series Transformer
Integration of Spatio-temporal Features via Cross Attention
Experimentation and Results
Data Collection and Training
Results
Conclusion and Future Work
Dataset
Data Collection Process
Data Samples

Figures (2)

Figure 1: GeoFormer model architecture. Here, $m_t$ represents an NO2 prediction at timestamp $t$, and CAM represents the cross-attention module.
Figure 2: Example of Sentinel-5P imagery with corresponding surface-level NO2 concentrations for 6 consecutive days at the same location.

GeoFormer: A Vision and Sequence Transformer-based Approach for Greenhouse Gas Monitoring

TL;DR

Abstract

GeoFormer: A Vision and Sequence Transformer-based Approach for Greenhouse Gas Monitoring

Authors

TL;DR

Abstract

Table of Contents

Figures (2)