GeoFormer: A Vision and Sequence Transformer-based Approach for Greenhouse Gas Monitoring
Madhav Khirwar, Ankur Narang
TL;DR
GeoFormer tackles the challenge of predicting surface-level $NO_2$ distributions in regions with limited ground-based monitoring by fusing a Vision Transformer for satellite imagery with an efficient time-series Transformer for temporal NO$_2$ dynamics. The approach uses cross-attention to integrate spatial and temporal cues from paired Sentinel-5P imagery and ground measurements, achieving a reported $MAE=5.65$ and strong efficiency on a 15-month, 35-station European dataset. Key contributions include a paired Sentinel-5P–NO$_2$ dataset, a compact spatio-temporal transformer architecture with ProbSparse attention, and a cross-attention fusion mechanism that outperforms Sentinel-2–based baselines while requiring significantly less compute. The work has practical implications for real-time climate monitoring and emission regulation, with potential extension to other pollutants and broader geographic coverage.
Abstract
Air pollution represents a pivotal environmental challenge globally, playing a major role in climate change via greenhouse gas emissions and negatively affecting the health of billions. However predicting the spatial and temporal patterns of pollutants remains challenging. The scarcity of ground-based monitoring facilities and the dependency of air pollution modeling on comprehensive datasets, often inaccessible for numerous areas, complicate this issue. In this work, we introduce GeoFormer, a compact model that combines a vision transformer module with a highly efficient time-series transformer module to predict surface-level nitrogen dioxide (NO2) concentrations from Sentinel-5P satellite imagery. We train the proposed model to predict surface-level NO2 measurements using a dataset we constructed with Sentinel-5P images of ground-level monitoring stations, and their corresponding NO2 concentration readings. The proposed model attains high accuracy (MAE 5.65), demonstrating the efficacy of combining vision and time-series transformer architectures to harness satellite-derived data for enhanced GHG emission insights, proving instrumental in advancing climate change monitoring and emission regulation efforts globally.
