Synergistic Neural Forecasting of Air Pollution with Stochastic Sampling
Yohan Abeysinghe, Muhammad Akhtar Munir, Sanoojan Baliah, Ron Sarafian, Fahad Shahbaz Khan, Yinon Rudich, Salman Khan
TL;DR
This work addresses the underprediction of extreme air pollution events by developing SynCast, a high-resolution neural forecast model that jointly models meteorology and particulate matter (PM) using a regionally adapted transformer backbone. A diffusion-based stochastic refinement module is integrated to sharpen spatial features and better capture tail events, with region-specific LoRA fine-tuning to adapt the model to local conditions. Across the Middle East and North Africa, SynCast delivers substantial improvements in 24-hour PM forecasts and tail-sensitive metrics, and maintains competitive performance in unseen regions and longer lead times, illustrating robust generalization. The approach offers a scalable framework for air quality early warning and climate-health risk mitigation, balancing global weather priors with local adaptation while addressing the computational demands of high-resolution forecasting.
Abstract
Air pollution remains a leading global health and environmental risk, particularly in regions vulnerable to episodic air pollution spikes due to wildfires, urban haze and dust storms. Accurate forecasting of particulate matter (PM) concentrations is essential to enable timely public health warnings and interventions, yet existing models often underestimate rare but hazardous pollution events. Here, we present SynCast, a high-resolution neural forecasting model that integrates meteorological and air composition data to improve predictions of both average and extreme pollution levels. Built on a regionally adapted transformer backbone and enhanced with a diffusion-based stochastic refinement module, SynCast captures the nonlinear dynamics driving PM spikes more accurately than existing approaches. Leveraging on harmonized ERA5 and CAMS datasets, our model shows substantial gains in forecasting fidelity across multiple PM variables (PM$_1$, PM$_{2.5}$, PM$_{10}$), especially under extreme conditions. We demonstrate that conventional loss functions underrepresent distributional tails (rare pollution events) and show that SynCast, guided by domain-aware objectives and extreme value theory, significantly enhances performance in highly impacted regions without compromising global accuracy. This approach provides a scalable foundation for next-generation air quality early warning systems and supports climate-health risk mitigation in vulnerable regions.
