Urban Air Pollution Forecasting: a Machine Learning Approach leveraging Satellite Observations and Meteorological Forecasts
Giacomo Blanco, Luca Barco, Lorenzo Innocenti, Claudio Rossi
TL;DR
Addresses the challenge of forecasting urban air pollution in areas with sparse monitoring networks by fusing Sentinel-5P satellite data, meteorological observations, and static topographic features to predict next-day concentrations of five pollutants in Milan. The approach trains pollutant-specific models on multi-source data organized in $w$-day windows and compares Linear Regression, Gradient Boosting Regression, and SGD Regression, with Gradient Boosting most often achieving the best accuracy. A public Milan dataset is released to spur further research, and the method enables coverage of areas lacking monitoring stations via a 500-meter grid extrapolation. The work demonstrates the practical utility of satellite-ground data fusion for timely, city-scale air quality forecasts and motivates future extensions to longer horizons, a unified multi-pollutant model, and deeper learning approaches.
Abstract
Air pollution poses a significant threat to public health and well-being, particularly in urban areas. This study introduces a series of machine-learning models that integrate data from the Sentinel-5P satellite, meteorological conditions, and topological characteristics to forecast future levels of five major pollutants. The investigation delineates the process of data collection, detailing the combination of diverse data sources utilized in the study. Through experiments conducted in the Milan metropolitan area, the models demonstrate their efficacy in predicting pollutant levels for the forthcoming day, achieving a percentage error of around 30%. The proposed models are advantageous as they are independent of monitoring stations, facilitating their use in areas without existing infrastructure. Additionally, we have released the collected dataset to the public, aiming to stimulate further research in this field. This research contributes to advancing our understanding of urban air quality dynamics and emphasizes the importance of amalgamating satellite, meteorological, and topographical data to develop robust pollution forecasting models.
