Tackling Incomplete Data in Air Quality Prediction: A Bayesian Deep Learning Framework for Uncertainty Quantification
Yuzhuang Pian, Taiyu Wang, Shiqi Zhang, Rui Xu, Yonghong Liu
TL;DR
This work tackles the challenge of forecasting air quality with incomplete observations by introducing CGLU-BNF, an end-to-end Bayesian deep learning framework. It combines a multilevel spatio-temporal encoder with temporal harmonics, spatial Fourier features, and graph attention, followed by channel gated learning units and a MAP-based Bayesian predictor to output predictive means with calibrated uncertainty. The model demonstrates superior accuracy and sharper prediction intervals across two real datasets and multiple missing-data patterns, outperforming five baselines and showcasing robustness to various missingness structures. The approach enables reliable spatio-temporal forecasting in emerging sensing scenarios, including mobile and irregularly sampled monitoring, with practical implications for health alerts, emissions control, and decision making under uncertainty.
Abstract
Accurate air quality forecasts are vital for public health alerts, exposure assessment, and emissions control. In practice, observational data are often missing in varying proportions and patterns due to collection and transmission issues. These incomplete spatiotemporal records impede reliable inference and risk assessment and can lead to overconfident extrapolation. To address these challenges, we propose an end to end framework, the channel gated learning unit based spatiotemporal bayesian neural field (CGLUBNF). It uses Fourier features with a graph attention encoder to capture multiscale spatial dependencies and seasonal temporal dynamics. A channel gated learning unit, equipped with learnable activations and gated residual connections, adaptively filters and amplifies informative features. Bayesian inference jointly optimizes predictive distributions and parameter uncertainty, producing point estimates and calibrated prediction intervals. We conduct a systematic evaluation on two real world datasets, covering four typical missing data patterns and comparing against five state of the art baselines. CGLUBNF achieves superior prediction accuracy and sharper confidence intervals. In addition, we further validate robustness across multiple prediction horizons and analysis the contribution of extraneous variables. This research lays a foundation for reliable deep learning based spatio-temporal forecasting with incomplete observations in emerging sensing paradigms, such as real world vehicle borne mobile monitoring.
