Real-Time Long Horizon Air Quality Forecasting via Group-Relative Policy Optimization
Inha Kang, Eunki Kim, Wonjeong Ryu, Jaeyo Shin, Seungjun Yu, Yoon-Hee Kang, Seongeun Jeong, Eunhye Kim, Soontae Kim, Hyunjung Shim
TL;DR
The paper tackles long-horizon PM forecasting in East Asia where global foundation models struggle due to regional dynamics and real-time constraints. It introduces CMAQ-OBS, a regional dataset pairing OBS with CMAQ reanalysis for real-time initialization, and presents FAKER-Air, a two-stage pipeline combining Stage 1 Temporal Accumulation-enhanced supervised fine-tuning and Stage 2 Group-Relative Policy Optimization with curriculum rollout and class-wise AQI rewards. The approach yields a 47.3% reduction in false alarm rate and competitive F1-scores compared with the Aurora baseline, demonstrating improved operational reliability for multi-day air-quality warnings. By fusing physics-based regional modeling with decision-centered optimization, the work enables practical, region-specific, real-time long-horizon forecasts with direct public health relevance.
Abstract
Accurate long horizon forecasting of particulate matter (PM) concentration fields is essential for operational public health decisions. However, achieving reliable forecasts remains challenging in regions with complex terrain and strong atmospheric dynamics such as East Asia. While foundation models such as Aurora offer global generality, they often miss region-specific dynamics and rely on non-real-time inputs, limiting their practical utility for localized warning systems. To address this gap, we construct and release the real-world observations and high-resolution CMAQ-OBS dataset for East Asia, reducing regional error by 59.5% and enabling real-time 48-120 hour forecasts critical for public health alerts. However, standard point-wise objectives cannot reflect asymmetric operational costs, where false alarms deteriorate public trust while missed severe events endanger populations. This cost mismatch causes SFT models to over-predict and yield high False Alarm Rates. We introduce Group-Relative Policy Optimization (GRPO) with class-wise rewards and curriculum rollout to align predictions with operational priorities. Experimental results demonstrate that our framework significantly improves the reliability of the forecast. Compared to the SFT-only baseline, our model reduces the False Alarm Rate by 47.3% while achieving a competitive F1-score, proving its effectiveness for practical, real-world air quality forecasting systems on long lead time scenarios.
