FlowDA: Accurate, Low-Latency Weather Data Assimilation via Flow Matching
Ran Cheng, Lailai Zhu
TL;DR
FlowDA addresses the computational bottleneck of traditional data assimilation by using flow matching to perform fast, data-driven analyses conditioned on sparse observations. It combines a SetConv-based observation embedding with a fine-tuned Aurora foundation model to learn a velocity field that morphs a background state into an analysis, enabling low-latency, robust assimilation. The method outperforms baselines on single-step and long-horizon cycling tasks across varying observation densities and noise levels, with strong robustness and substantial speed advantages. This approach demonstrates a scalable, data-driven direction for weather-scale data assimilation with practical implications for ML-based forecasting pipelines.
Abstract
Data assimilation (DA) is a fundamental component of modern weather prediction, yet it remains a major computational bottleneck in machine learning (ML)-based forecasting pipelines due to reliance on traditional variational methods. Recent generative ML-based DA methods offer a promising alternative but typically require many sampling steps and suffer from error accumulation under long-horizon auto-regressive rollouts with cycling assimilation. We propose FlowDA, a low-latency weather-scale generative DA framework based on flow matching. FlowDA conditions on observations through a SetConv-based embedding and fine-tunes the Aurora foundation model to deliver accurate, efficient, and robust analyses. Experiments across observation rates decreasing from $3.9\%$ to $0.1\%$ demonstrate superior performance of FlowDA over strong baselines with similar tunable-parameter size. FlowDA further shows robustness to observational noise and stable performance in long-horizon auto-regressive cycling DA. Overall, FlowDA points to an efficient and scalable direction for data-driven DA.
