Table of Contents
Fetching ...

Real-Time Long Horizon Air Quality Forecasting via Group-Relative Policy Optimization

Inha Kang, Eunki Kim, Wonjeong Ryu, Jaeyo Shin, Seungjun Yu, Yoon-Hee Kang, Seongeun Jeong, Eunhye Kim, Soontae Kim, Hyunjung Shim

TL;DR

The paper tackles long-horizon PM forecasting in East Asia where global foundation models struggle due to regional dynamics and real-time constraints. It introduces CMAQ-OBS, a regional dataset pairing OBS with CMAQ reanalysis for real-time initialization, and presents FAKER-Air, a two-stage pipeline combining Stage 1 Temporal Accumulation-enhanced supervised fine-tuning and Stage 2 Group-Relative Policy Optimization with curriculum rollout and class-wise AQI rewards. The approach yields a 47.3% reduction in false alarm rate and competitive F1-scores compared with the Aurora baseline, demonstrating improved operational reliability for multi-day air-quality warnings. By fusing physics-based regional modeling with decision-centered optimization, the work enables practical, region-specific, real-time long-horizon forecasts with direct public health relevance.

Abstract

Accurate long horizon forecasting of particulate matter (PM) concentration fields is essential for operational public health decisions. However, achieving reliable forecasts remains challenging in regions with complex terrain and strong atmospheric dynamics such as East Asia. While foundation models such as Aurora offer global generality, they often miss region-specific dynamics and rely on non-real-time inputs, limiting their practical utility for localized warning systems. To address this gap, we construct and release the real-world observations and high-resolution CMAQ-OBS dataset for East Asia, reducing regional error by 59.5% and enabling real-time 48-120 hour forecasts critical for public health alerts. However, standard point-wise objectives cannot reflect asymmetric operational costs, where false alarms deteriorate public trust while missed severe events endanger populations. This cost mismatch causes SFT models to over-predict and yield high False Alarm Rates. We introduce Group-Relative Policy Optimization (GRPO) with class-wise rewards and curriculum rollout to align predictions with operational priorities. Experimental results demonstrate that our framework significantly improves the reliability of the forecast. Compared to the SFT-only baseline, our model reduces the False Alarm Rate by 47.3% while achieving a competitive F1-score, proving its effectiveness for practical, real-world air quality forecasting systems on long lead time scenarios.

Real-Time Long Horizon Air Quality Forecasting via Group-Relative Policy Optimization

TL;DR

The paper tackles long-horizon PM forecasting in East Asia where global foundation models struggle due to regional dynamics and real-time constraints. It introduces CMAQ-OBS, a regional dataset pairing OBS with CMAQ reanalysis for real-time initialization, and presents FAKER-Air, a two-stage pipeline combining Stage 1 Temporal Accumulation-enhanced supervised fine-tuning and Stage 2 Group-Relative Policy Optimization with curriculum rollout and class-wise AQI rewards. The approach yields a 47.3% reduction in false alarm rate and competitive F1-scores compared with the Aurora baseline, demonstrating improved operational reliability for multi-day air-quality warnings. By fusing physics-based regional modeling with decision-centered optimization, the work enables practical, region-specific, real-time long-horizon forecasts with direct public health relevance.

Abstract

Accurate long horizon forecasting of particulate matter (PM) concentration fields is essential for operational public health decisions. However, achieving reliable forecasts remains challenging in regions with complex terrain and strong atmospheric dynamics such as East Asia. While foundation models such as Aurora offer global generality, they often miss region-specific dynamics and rely on non-real-time inputs, limiting their practical utility for localized warning systems. To address this gap, we construct and release the real-world observations and high-resolution CMAQ-OBS dataset for East Asia, reducing regional error by 59.5% and enabling real-time 48-120 hour forecasts critical for public health alerts. However, standard point-wise objectives cannot reflect asymmetric operational costs, where false alarms deteriorate public trust while missed severe events endanger populations. This cost mismatch causes SFT models to over-predict and yield high False Alarm Rates. We introduce Group-Relative Policy Optimization (GRPO) with class-wise rewards and curriculum rollout to align predictions with operational priorities. Experimental results demonstrate that our framework significantly improves the reliability of the forecast. Compared to the SFT-only baseline, our model reduces the False Alarm Rate by 47.3% while achieving a competitive F1-score, proving its effectiveness for practical, real-world air quality forecasting systems on long lead time scenarios.

Paper Structure

This paper contains 24 sections, 8 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Illustration of PM2.5 predictions. Our method effectively captures the dynamic temporal variations in PM concentration over time, whereas Aurora bodnar2025foundation fails to reflect such changes.
  • Figure 2: The seasonal L1 distance error compared with OBS ($\downarrow$ is better). Unlike global datasets (CAMS), our locally developed datasets (CMAQ) show low error with real observation.
  • Figure 3: Comparison of CAMS and CMAQ (ours) with OBS. CMAQ achieves lower regional error and near real-time availability, enabling stable long horizon forecasting.
  • Figure 4: Overall pipeline of FAKER-Air. Our two-stage training framework begins with supervised fine-tuning (SFT) with rollout loss, followed by Group-Relative Policy Optimization (GRPO). During GRPO, multiple trajectory groups are evaluated using AQI-based rewards to guide policy updates, with the rollout horizon gradually increasing to enable long-term predictions.
  • Figure 5: Qualitative comparison of long horizon PM2.5 forecasts over East Asia. Aurora rapidly loses regional structure. SFT restores coherent transport but slightly overextends moderate pollution. GRPO prunes these artifacts while preserving high‑pollution cores.