Table of Contents
Fetching ...

Synergistic Neural Forecasting of Air Pollution with Stochastic Sampling

Yohan Abeysinghe, Muhammad Akhtar Munir, Sanoojan Baliah, Ron Sarafian, Fahad Shahbaz Khan, Yinon Rudich, Salman Khan

TL;DR

This work addresses the underprediction of extreme air pollution events by developing SynCast, a high-resolution neural forecast model that jointly models meteorology and particulate matter (PM) using a regionally adapted transformer backbone. A diffusion-based stochastic refinement module is integrated to sharpen spatial features and better capture tail events, with region-specific LoRA fine-tuning to adapt the model to local conditions. Across the Middle East and North Africa, SynCast delivers substantial improvements in 24-hour PM forecasts and tail-sensitive metrics, and maintains competitive performance in unseen regions and longer lead times, illustrating robust generalization. The approach offers a scalable framework for air quality early warning and climate-health risk mitigation, balancing global weather priors with local adaptation while addressing the computational demands of high-resolution forecasting.

Abstract

Air pollution remains a leading global health and environmental risk, particularly in regions vulnerable to episodic air pollution spikes due to wildfires, urban haze and dust storms. Accurate forecasting of particulate matter (PM) concentrations is essential to enable timely public health warnings and interventions, yet existing models often underestimate rare but hazardous pollution events. Here, we present SynCast, a high-resolution neural forecasting model that integrates meteorological and air composition data to improve predictions of both average and extreme pollution levels. Built on a regionally adapted transformer backbone and enhanced with a diffusion-based stochastic refinement module, SynCast captures the nonlinear dynamics driving PM spikes more accurately than existing approaches. Leveraging on harmonized ERA5 and CAMS datasets, our model shows substantial gains in forecasting fidelity across multiple PM variables (PM$_1$, PM$_{2.5}$, PM$_{10}$), especially under extreme conditions. We demonstrate that conventional loss functions underrepresent distributional tails (rare pollution events) and show that SynCast, guided by domain-aware objectives and extreme value theory, significantly enhances performance in highly impacted regions without compromising global accuracy. This approach provides a scalable foundation for next-generation air quality early warning systems and supports climate-health risk mitigation in vulnerable regions.

Synergistic Neural Forecasting of Air Pollution with Stochastic Sampling

TL;DR

This work addresses the underprediction of extreme air pollution events by developing SynCast, a high-resolution neural forecast model that jointly models meteorology and particulate matter (PM) using a regionally adapted transformer backbone. A diffusion-based stochastic refinement module is integrated to sharpen spatial features and better capture tail events, with region-specific LoRA fine-tuning to adapt the model to local conditions. Across the Middle East and North Africa, SynCast delivers substantial improvements in 24-hour PM forecasts and tail-sensitive metrics, and maintains competitive performance in unseen regions and longer lead times, illustrating robust generalization. The approach offers a scalable framework for air quality early warning and climate-health risk mitigation, balancing global weather priors with local adaptation while addressing the computational demands of high-resolution forecasting.

Abstract

Air pollution remains a leading global health and environmental risk, particularly in regions vulnerable to episodic air pollution spikes due to wildfires, urban haze and dust storms. Accurate forecasting of particulate matter (PM) concentrations is essential to enable timely public health warnings and interventions, yet existing models often underestimate rare but hazardous pollution events. Here, we present SynCast, a high-resolution neural forecasting model that integrates meteorological and air composition data to improve predictions of both average and extreme pollution levels. Built on a regionally adapted transformer backbone and enhanced with a diffusion-based stochastic refinement module, SynCast captures the nonlinear dynamics driving PM spikes more accurately than existing approaches. Leveraging on harmonized ERA5 and CAMS datasets, our model shows substantial gains in forecasting fidelity across multiple PM variables (PM, PM, PM), especially under extreme conditions. We demonstrate that conventional loss functions underrepresent distributional tails (rare pollution events) and show that SynCast, guided by domain-aware objectives and extreme value theory, significantly enhances performance in highly impacted regions without compromising global accuracy. This approach provides a scalable foundation for next-generation air quality early warning systems and supports climate-health risk mitigation in vulnerable regions.

Paper Structure

This paper contains 7 sections, 11 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: We compare PM$_{1}$ forecasts from Aurora and SynCast with CAMS ground truth over four time steps between 12–13 June 2022. While both models capture the overall pollution patterns, SynCast shows better agreement with CAMS, especially in detecting sudden local spikes in pollution levels. These regions, which are missed by Aurora but captured by SynCast, are marked with red circles. Additional qualitative comparisons and extended visualizations are provided in Appendix A
  • Figure 2: $\mathrm{PM}_{2.5}$ predictions during a major dust storm episode in May 2022. The first panel shows input conditions from the previous timestep. The second and third panels present predictions from the deterministic baseline and SynCast with diffusion-based refinement, respectively. The fourth panel shows Aurora forecasts, while the fifth panel displays the CAMS target data. Compared to both the deterministic baseline and Aurora, SynCast more accurately reconstructs the spatial extent and intensity of the dust plume.
  • Figure 3: Generalization results over the Chinese region. SynCast was trained exclusively on the MENA domain but is evaluated here for PM$_1$ forecasts in China. The first panel shows outputs from a globally full fine-tuned model (FFT), which tends to oversmooth local structures. In contrast, SynCast (middle) better preserves sharp gradients and plume coherence, capturing broad spatial patterns and major pollution events with closer alignment to the CAMS target (right). Some localized degradation remains, but the results highlight SynCast's stronger transferability relative to naive global fine-tuning.
  • Figure 4: Performance comparison of SynCast and Aurora across increasing lead times (1-6 days) on PM$_1$, PM$_{2.5}$, and PM$_{10}$ forecasting over the MENA region. RMSE ($\mu$g/m$^3$) is reported using CAMS as ground truth. SynCast consistently outperforms Aurora across all time horizons and particulate matter types, demonstrating more stable and accurate long-range predictions.
  • Figure 5: RMSE comparison for PM$_1$ predictions across several Middle Eastern countries, evaluating SynCast against Aurora. SynCast consistently achieves lower RMSE, demonstrating its improved generalization and regional adaptability in country-level forecasts.
  • ...and 5 more figures