Table of Contents
Fetching ...

ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting

Jindong Tian, Yifei Ding, Ronghui Xu, Hao Miao, Chenjuan Guo, Bin Yang

TL;DR

ARROW tackles long-horizon global weather forecasting by addressing two key challenges: insufficient multi-scale spatiotemporal modeling and inflexible autoregression. It introduces a Multi-Interval Forecasting Model (MIFM) that blends Ring Positional Encoding with a Shared-Private Mixture-of-Experts to jointly capture shared and interval-specific atmospheric dynamics across multiple time horizons. Complementing this, an Adaptive Rollout Scheduler (AR Scheduler) learns to select forecast intervals via a Deep Q-Network, balancing error accumulation with the need to capture rapid atmospheric changes; the training uses an alternating optimization framework that couples the forecasting model and rollout policy. On WeatherBench ERA5 data, ARROW achieves state-of-the-art performance with roughly 10% gains in RMSE and ACC, and ablations confirm the critical roles of RPE, S&P MoE, auxiliary MoE losses, and adaptive rollout. The approach demonstrates the practicality of adaptive, multi-scale routing for data-driven global weather forecasting and sets the stage for physics-informed extensions and local forecast applications.

Abstract

Weather forecasting is a fundamental task in spatiotemporal data analysis, with broad applications across a wide range of domains. Existing data-driven forecasting methods typically model atmospheric dynamics over a fixed short time interval (e.g., 6 hours) and rely on naive autoregression-based rollout for long-term forecasting (e.g., 138 hours). However, this paradigm suffers from two key limitations: (1) it often inadequately models the spatial and multi-scale temporal dependencies inherent in global weather systems, and (2) the rollout strategy struggles to balance error accumulation with the capture of fine-grained atmospheric variations. In this study, we propose ARROW, an Adaptive-Rollout Multi-scale temporal Routing method for Global Weather Forecasting. To contend with the first limitation, we construct a multi-interval forecasting model that forecasts weather across different time intervals. Within the model, the Shared-Private Mixture-of-Experts captures both shared patterns and specific characteristics of atmospheric dynamics across different time scales, while Ring Positional Encoding accurately encodes the circular latitude structure of the Earth when representing spatial information. For the second limitation, we develop an adaptive rollout scheduler based on reinforcement learning, which selects the most suitable time interval to forecast according to the current weather state. Experimental results demonstrate that ARROW achieves state-of-the-art performance in global weather forecasting, establishing a promising paradigm in this field.

ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting

TL;DR

ARROW tackles long-horizon global weather forecasting by addressing two key challenges: insufficient multi-scale spatiotemporal modeling and inflexible autoregression. It introduces a Multi-Interval Forecasting Model (MIFM) that blends Ring Positional Encoding with a Shared-Private Mixture-of-Experts to jointly capture shared and interval-specific atmospheric dynamics across multiple time horizons. Complementing this, an Adaptive Rollout Scheduler (AR Scheduler) learns to select forecast intervals via a Deep Q-Network, balancing error accumulation with the need to capture rapid atmospheric changes; the training uses an alternating optimization framework that couples the forecasting model and rollout policy. On WeatherBench ERA5 data, ARROW achieves state-of-the-art performance with roughly 10% gains in RMSE and ACC, and ablations confirm the critical roles of RPE, S&P MoE, auxiliary MoE losses, and adaptive rollout. The approach demonstrates the practicality of adaptive, multi-scale routing for data-driven global weather forecasting and sets the stage for physics-informed extensions and local forecast applications.

Abstract

Weather forecasting is a fundamental task in spatiotemporal data analysis, with broad applications across a wide range of domains. Existing data-driven forecasting methods typically model atmospheric dynamics over a fixed short time interval (e.g., 6 hours) and rely on naive autoregression-based rollout for long-term forecasting (e.g., 138 hours). However, this paradigm suffers from two key limitations: (1) it often inadequately models the spatial and multi-scale temporal dependencies inherent in global weather systems, and (2) the rollout strategy struggles to balance error accumulation with the capture of fine-grained atmospheric variations. In this study, we propose ARROW, an Adaptive-Rollout Multi-scale temporal Routing method for Global Weather Forecasting. To contend with the first limitation, we construct a multi-interval forecasting model that forecasts weather across different time intervals. Within the model, the Shared-Private Mixture-of-Experts captures both shared patterns and specific characteristics of atmospheric dynamics across different time scales, while Ring Positional Encoding accurately encodes the circular latitude structure of the Earth when representing spatial information. For the second limitation, we develop an adaptive rollout scheduler based on reinforcement learning, which selects the most suitable time interval to forecast according to the current weather state. Experimental results demonstrate that ARROW achieves state-of-the-art performance in global weather forecasting, establishing a promising paradigm in this field.

Paper Structure

This paper contains 31 sections, 17 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Different rollout scheme in weather forecasting. (a) SIFM with naive rollout. (b) Three SIFMs with greedy rollout. (c) Multi-interval forecasting model (MIFM) with adaptive rollout.
  • Figure 2: The overall framework of ARROW.
  • Figure 3: Effect of Adaptive Rollout Scheduler on T2m and T850 at a 138-hour lead time.
  • Figure 4: Visualization of T2m during the Siberian cold wave (unit: °C).
  • Figure 5: Visualization of TCC in China (unitless).
  • ...and 1 more figures