Table of Contents
Fetching ...

PollutionNet: A Vision Transformer Framework for Climatological Assessment of NO$_2$ and SO$_2$ Using Satellite-Ground Data Fusion

Prasanjit Dey, Soumyabrata Dev, Bianca Schoen-Phelan

Abstract

Accurate assessment of atmospheric nitrogen dioxide (NO$_2$) and sulfur dioxide (SO$_2$) is essential for understanding climate-air quality interactions, supporting environmental policy, and protecting public health. Traditional monitoring approaches face limitations: satellite observations provide broad spatial coverage but suffer from data gaps, while ground-based sensors offer high temporal resolution but limited spatial extent. To address these challenges, we propose PollutionNet, a Vision Transformer-based framework that integrates Sentinel-5P TROPOMI vertical column density (VCD) data with ground-level observations. By leveraging self-attention mechanisms, PollutionNet captures complex spatiotemporal dependencies that are often missed by conventional CNN and RNN models. Applied to Ireland (2020-2021), our case study demonstrates that PollutionNet achieves state-of-the-art performance (RMSE: 6.89 $μ$g/m$^3$ for NO$_2$, 4.49 $μ$g/m$^3$ for SO$_2$), reducing prediction errors by up to 14% compared to baseline models. Beyond accuracy gains, PollutionNet provides a scalable and data-efficient tool for applied climatology, enabling robust pollution assessments in regions with sparse monitoring networks. These results highlight the potential of advanced machine learning approaches to enhance climate-related air quality research, inform environmental management, and support sustainable policy decisions.

PollutionNet: A Vision Transformer Framework for Climatological Assessment of NO$_2$ and SO$_2$ Using Satellite-Ground Data Fusion

Abstract

Accurate assessment of atmospheric nitrogen dioxide (NO) and sulfur dioxide (SO) is essential for understanding climate-air quality interactions, supporting environmental policy, and protecting public health. Traditional monitoring approaches face limitations: satellite observations provide broad spatial coverage but suffer from data gaps, while ground-based sensors offer high temporal resolution but limited spatial extent. To address these challenges, we propose PollutionNet, a Vision Transformer-based framework that integrates Sentinel-5P TROPOMI vertical column density (VCD) data with ground-level observations. By leveraging self-attention mechanisms, PollutionNet captures complex spatiotemporal dependencies that are often missed by conventional CNN and RNN models. Applied to Ireland (2020-2021), our case study demonstrates that PollutionNet achieves state-of-the-art performance (RMSE: 6.89 g/m for NO, 4.49 g/m for SO), reducing prediction errors by up to 14% compared to baseline models. Beyond accuracy gains, PollutionNet provides a scalable and data-efficient tool for applied climatology, enabling robust pollution assessments in regions with sparse monitoring networks. These results highlight the potential of advanced machine learning approaches to enhance climate-related air quality research, inform environmental management, and support sustainable policy decisions.

Paper Structure

This paper contains 27 sections, 9 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Study region for satellite and ground observations of NO2 and SO2 concentrations. Grid cells represent the spatial domains for each pollutant.
  • Figure 2: The Proposed frameworks of the PollutionNet for estimating and prediction of NO$_2$, and SO$_2$ concentration.
  • Figure 3: End-to-end spatial-temporal fusion workflow showing: (1) input data acquisition from TROPOMI satellite and ground stations, (2) quality control and preprocessing, (3) core fusion algorithm execution, and (4) gap-filled output generation.
  • Figure 4: Technical implementation of the fusion algorithm showing: (a) neighborhood selection criteria based on spatial similarity thresholds, (b) weight allocation methodology, and (c) reconstruction results for complete and partial gap scenarios.
  • Figure 5: Architecture of the Vision Transformer (ViT) for NO2 and SO2 concentration prediction, showing: (1) patch embedding and positional encoding, (2) multi-head self-attention layers, and (3) MLP projection head.
  • ...and 5 more figures