Table of Contents
Fetching ...

AGCD: Agent-Guided Cross-Modal Decoding for Weather Forecasting

Jing Wu, Yang Liu, Lin Zhang, Junbo Zeng, Jiabin Wang, Zi Ye, Guowen Li, Shilei Cao, Jiashun Cheng, Fang Wang, Meng Jin, Yerong Feng, Hong Cheng, Yutong Lu, Haohuan Fu, Juepeng Zheng

Abstract

Accurate weather forecasting is more than grid-wise regression: it must preserve coherent synoptic structures and physical consistency of meteorological fields, especially under autoregressive rollouts where small one-step errors can amplify into structural bias. Existing physics-priors approaches typically impose global, once-for-all constraints via architectures, regularization, or NWP coupling, offering limited state-adaptive and sample-specific controllability at deployment. To bridge this gap, we propose Agent-Guided Cross-modal Decoding (AGCD), a plug-and-play decoding-time prior-injection paradigm that derives state-conditioned physics-priors from the current multivariate atmosphere and injects them into forecasters in a controllable and reusable way. Specifically, We design a multi-agent meteorological narration pipeline to generate state-conditioned physics-priors, utilizing MLLMs to extract various meteorological elements effectively. To effectively apply the priors, AGCD further introduce cross-modal region interaction decoding that performs region-aware multi-scale tokenization and efficient physics-priors injection to refine visual features without changing the backbone interface. Experiments on WeatherBench demonstrate consistent gains for 6-hour forecasting across two resolutions (5.625 degree and 1.40625 degree) and diverse backbones (generic and weather-specialized), including strictly causal 48-hour autoregressive rollouts that reduce early-stage error accumulation and improve long-horizon stability.

AGCD: Agent-Guided Cross-Modal Decoding for Weather Forecasting

Abstract

Accurate weather forecasting is more than grid-wise regression: it must preserve coherent synoptic structures and physical consistency of meteorological fields, especially under autoregressive rollouts where small one-step errors can amplify into structural bias. Existing physics-priors approaches typically impose global, once-for-all constraints via architectures, regularization, or NWP coupling, offering limited state-adaptive and sample-specific controllability at deployment. To bridge this gap, we propose Agent-Guided Cross-modal Decoding (AGCD), a plug-and-play decoding-time prior-injection paradigm that derives state-conditioned physics-priors from the current multivariate atmosphere and injects them into forecasters in a controllable and reusable way. Specifically, We design a multi-agent meteorological narration pipeline to generate state-conditioned physics-priors, utilizing MLLMs to extract various meteorological elements effectively. To effectively apply the priors, AGCD further introduce cross-modal region interaction decoding that performs region-aware multi-scale tokenization and efficient physics-priors injection to refine visual features without changing the backbone interface. Experiments on WeatherBench demonstrate consistent gains for 6-hour forecasting across two resolutions (5.625 degree and 1.40625 degree) and diverse backbones (generic and weather-specialized), including strictly causal 48-hour autoregressive rollouts that reduce early-stage error accumulation and improve long-horizon stability.
Paper Structure (31 sections, 14 equations, 7 figures, 5 tables)

This paper contains 31 sections, 14 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Global static physics-priors vs. State-conditioned physics-priors: proposed AGCD injects cached state-conditioned physics-priors at decoding time.
  • Figure 2: The overview of the proposed AGCD.
  • Figure 3: Structure of Cross-Modal Interaction.
  • Figure 4: Qualitative comparison of 6 hour weather forecasting with Pangu and Pangu+AGCD (AGCD) on 1.40625$^\circ$ data across multiple variables. (a) Initial fields at time $t$. (b) Ground-truth targets at $t{+}6$h. (c) Predictions from the vanilla Pangu. (d) Error maps from the vanilla Pangu. (e) Predictions from Pangu with our AGCD. (f) Error maps from Pangu with our AGCD. Error maps visualize Pred$-$GT.
  • Figure 5: Autoregressive rollout comparison between Pangu and Pangu+AGCD up to 48 hours (6 hour steps).
  • ...and 2 more figures