CLIMATEAGENT: Multi-Agent Orchestration for Complex Climate Data Science Workflows
Hyeonjae Kim, Chenyue Li, Wen Deng, Mengxi Jin, Wen Huang, Mengqian Lu, Binhang Yuan
TL;DR
ClimateAgent tackles the bottleneck of climate analytics amid massive, heterogeneous data by introducing a specialized multi-agent orchestration framework that decomposes complex questions into executable subtasks executed by domain-aware agents. The Orchestrate-, Plan-, Data-, and Coding-Agents enable dynamic API awareness, persistent context, and a self-correcting execution loop, delivering robust end-to-end climate workflows. On the Climate-Agent-Bench-85 benchmark, ClimateAgent achieves $100\%$ task completion with a report quality of $8.32$, outperforming strong baselines such as GPT-5 ($3.26$) and GitHub Copilot ($6.27$), illustrating substantial gains in planning, coordination, and correction across diverse climate phenomena. This framework advances reproducible, reliable climate science automation, reducing manual effort and enabling rapid, hypothesis-driven analysis across atmospheric rivers, drought, extreme precipitation, heat waves, SST, and tropical cyclones, with practical implications for adaptation and policy.
Abstract
Climate science demands automated workflows to transform comprehensive questions into data-driven statements across massive, heterogeneous datasets. However, generic LLM agents and static scripting pipelines lack climate-specific context and flexibility, thus, perform poorly in practice. We present ClimateAgent, an autonomous multi-agent framework that orchestrates end-to-end climate data analytic workflows. ClimateAgent decomposes user questions into executable sub-tasks coordinated by an Orchestrate-Agent and a Plan-Agent; acquires data via specialized Data-Agents that dynamically introspect APIs to synthesize robust download scripts; and completes analysis and reporting with a Coding-Agent that generates Python code, visualizations, and a final report with a built-in self-correction loop. To enable systematic evaluation, we introduce Climate-Agent-Bench-85, a benchmark of 85 real-world tasks spanning atmospheric rivers, drought, extreme precipitation, heat waves, sea surface temperature, and tropical cyclones. On Climate-Agent-Bench-85, ClimateAgent achieves 100% task completion and a report quality score of 8.32, outperforming GitHub-Copilot (6.27) and a GPT-5 baseline (3.26). These results demonstrate that our multi-agent orchestration with dynamic API awareness and self-correcting execution substantially advances reliable, end-to-end automation for climate science analytic tasks.
