Table of Contents
Fetching ...

A Self-Evolving AI Agent System for Climate Science

Zijie Guo, Jiong Wang, Fenghua Ling, Wangxu Wei, Xiaoyu Yue, Zhe Jiang, Wanghan Xu, Jing-Jia Luo, Lijing Cheng, Yoo-Geun Ham, Fengfei Song, Pierre Gentine, Toshio Yamagata, Ben Fei, Wenlong Zhang, Xinyu Gu, Chao Li, Yaqiang Wang, Tao Chen, Wanli Ouyang, Bowen Zhou, Lei Bai

TL;DR

EarthLink presents a self-evolving AI system designed as a copilot for climate science, integrating planning, code generation, data analysis, and physical reasoning to address data fragmentation across Earth system domains. Its architecture comprises Planning, Scientific Diagnosis, and Multi-Scenario Analysis modules, underpinned by Knowledge, Data, and Tool Libraries, with GPT‑5 as the foundation model and per-module parameter customization. Through extensive evaluations on 36 climate tasks and a fully open-ended Atlantic Niño discovery scenario, EarthLink demonstrates near‑junior-researcher proficiency, autonomous hypothesis generation, and iterative self-improvement, while highlighting limitations and the need for transparent human oversight. The framework envisions a new human–AI research paradigm that can compress scientific discovery timelines, empower broader participation, and produce transparent, reproducible workflows; the work provides a public platform and open tooling to accelerate Earth sciences research. Score_overall is conceptually defined as the mean regional reliability across tasks, $Score_{overall} = \frac{1}{M}\sum_{m=1}^M S_m$, where each $S_m$ denotes a per-region performance metric, enabling quantitative cross-domain comparisons of AI-assisted climate science.

Abstract

Scientific progress in Earth science depends on integrating data across the planet's interconnected spheres. However, the accelerating volume and fragmentation of multi-sphere knowledge and data have surpassed human analytical capacity. This creates a major bottleneck for discovery, especially in climate science. To address this challenge, we introduce EarthLink, the first self-evolving AI agent system designed as an interactive "copilot" for Earth scientists. Through natural language interaction, EarthLink automates the entire research workflow by integrating planning, code execution, data analysis, and physical reasoning into a unified process that directly addresses this limitation. Beyond efficiency, it exhibits human-like cross-disciplinary analytical ability and achieves proficiency comparable to a junior researcher in expert evaluations on core large-scale climate tasks, including model-observation comparison and climate change understanding. When tasked with an open scientific problem, specifically the discovery of precursors of the Atlantic Niño, EarthLink autonomously developed a research strategy, identified sources of predictability, verified its hypotheses with available data, and proposed a physically consistent mechanism. These emerging capabilities enable a new human-AI research paradigm. Scientists can focus on value and result judgments, while AI systems handle complex data analysis and knowledge integration. This accelerates the pace and breadth of discovery in Earth sciences. The system is accessible at our website https://earthlink.intern-ai.org.cn.

A Self-Evolving AI Agent System for Climate Science

TL;DR

EarthLink presents a self-evolving AI system designed as a copilot for climate science, integrating planning, code generation, data analysis, and physical reasoning to address data fragmentation across Earth system domains. Its architecture comprises Planning, Scientific Diagnosis, and Multi-Scenario Analysis modules, underpinned by Knowledge, Data, and Tool Libraries, with GPT‑5 as the foundation model and per-module parameter customization. Through extensive evaluations on 36 climate tasks and a fully open-ended Atlantic Niño discovery scenario, EarthLink demonstrates near‑junior-researcher proficiency, autonomous hypothesis generation, and iterative self-improvement, while highlighting limitations and the need for transparent human oversight. The framework envisions a new human–AI research paradigm that can compress scientific discovery timelines, empower broader participation, and produce transparent, reproducible workflows; the work provides a public platform and open tooling to accelerate Earth sciences research. Score_overall is conceptually defined as the mean regional reliability across tasks, , where each denotes a per-region performance metric, enabling quantitative cross-domain comparisons of AI-assisted climate science.

Abstract

Scientific progress in Earth science depends on integrating data across the planet's interconnected spheres. However, the accelerating volume and fragmentation of multi-sphere knowledge and data have surpassed human analytical capacity. This creates a major bottleneck for discovery, especially in climate science. To address this challenge, we introduce EarthLink, the first self-evolving AI agent system designed as an interactive "copilot" for Earth scientists. Through natural language interaction, EarthLink automates the entire research workflow by integrating planning, code execution, data analysis, and physical reasoning into a unified process that directly addresses this limitation. Beyond efficiency, it exhibits human-like cross-disciplinary analytical ability and achieves proficiency comparable to a junior researcher in expert evaluations on core large-scale climate tasks, including model-observation comparison and climate change understanding. When tasked with an open scientific problem, specifically the discovery of precursors of the Atlantic Niño, EarthLink autonomously developed a research strategy, identified sources of predictability, verified its hypotheses with available data, and proposed a physically consistent mechanism. These emerging capabilities enable a new human-AI research paradigm. Scientists can focus on value and result judgments, while AI systems handle complex data analysis and knowledge integration. This accelerates the pace and breadth of discovery in Earth sciences. The system is accessible at our website https://earthlink.intern-ai.org.cn.

Paper Structure

This paper contains 52 sections, 3 equations, 16 figures, 5 tables.

Figures (16)

  • Figure 1: The EarthLink workflow for automated climate data analysis.a, In the Planning Module, the system first receives a user's scientific request and optional relevant literature, from which the Planning Agent generates multiple candidate experimental plans. The Plan Aggregation Agent, with optional human supervision, then reviews and integrates these plans to form an optimal final plan. b, The Scientific Diagnosis Module executes this plan. A Coding Agent automatically writes code, processing data and referring to tools from the Resource Libraries to perform computations and visualizations. This module features a built-in feedback loop, enabling autonomous debugging and iterative refinement through Result Checking and Image Feedback. c, In the Multi-Scenario Analysis Module, the Analysis & Summary Agents conduct an in-depth analysis of the scientific results from the preceding module, and synthesize these insights into well-structured and comprehensive final reports. These reports can further provide scientific interpretations across various domains, including energy, agriculture, environment, and insurance, while delivering insights relevant to policy-making. d, The Resource Libraries serve as the foundational support for the entire workflow. They include: a Knowledge Library, which provides previously validated outputs (such as plans and code scripts), API documentation for relevant packages, and online web resources; a Data Library containing curated scientific datasets (e.g., CMIP6 data and observational records); and a Tool Library offering reusable functions, algorithms, and tools for data loading, computation, plotting, and more.
  • Figure 1: Evaluation of CMIP6 models in simulating seasonal cycles of monthly precipitation.a, Task definition and diagnostic requirements, including annual mean and seasonal cycles and regional time series analyses. b, Automated planning output from EarthLink, detailing end-to-end workflow from data acquisition to deliverables. c, Example code snippet generated by the system for data processing and analysis. d, Modelled and observed precipitation seasonal cycles over a selected region. e, Automated textual interpretation of the results, providing a plain-language summary generated by the system. A more complete case can be found in the Supplementary Information Section C.1.1.
  • Figure 2: Multi-level evaluation of EarthLink on a number of core climate analysis tasks.a, Level 1: Multisphere statistical feature comparison. EarthLink conducts diagnostic analyses across domains by comparing the CMIP6 simulation of climatological features, such as spatial patterns and variabilities, with observations. Examples include seasonal cycles of precipitation, cloud radiative effects, global temperature change, ocean heat content (OHC) timeseries, 20°C isotherm depth, Arctic ice climatology, Antarctic surface albedo, and runoff. b, Level 2: Mechanistic diagnosis. EarthLink estimates scenario-driven metrics such as equilibrium climate sensitivity (ECS) and transient climate response (TCR), demonstrating its ability to extract relevant datasets and implement standard diagnostic methods. c, Level 3: Physical process diagnosis. The system performs advanced analyses such as ENSO diversity classification and period detection, displaying emergent capacity in physical reasoning and chain-of-thought synthesis. Note that most of the image elements are directly produced by EarthLink, and the others are only slightly adjusted in layout. More results are shown in Supplementary Information Section C.
  • Figure 2: Benchmarking CMIP6 model simulation of cloud radiative effects (CRE) against ISCCP-FH observations.a, Task setup, requiring statistical mapping of climatology and variability and model ranking based on performance metrics. b, End-to-end automated planning by EarthLink, including data acquisition, computation, ranking, and visualization steps. c, Representative code snippet for automated CRE analysis. d, Global maps and zonal means of CRE climatology. e, Automated textual summary of findings, demonstrating EarthLink’s interpretative and reporting capabilities. A more complete case can be found in the Supplementary Information Section C.1.2.
  • Figure 3: Application of EarthLink to tackle future-oriented climate research challenges.a, Climate change detection, attribution, and future projection. EarthLink processes multi-model CMIP6 simulations under various experiments, accurately distinguishing between the effects of natural and anthropogenic forcings and generating global temperature anomaly timeseries. b, Constrained projections of future surface temperature for selected regions. Using hierarchical emergent constraints (HEC) and spatial aggregation approaches, EarthLink reduces projection uncertainty for city-level temperatures under the SSP2-4.5 scenario (2041–2060). c, Constrained projections of future temperature changes in Africa using constraining factors automatically identified by EarthLink. Note that most of the image elements in a–c are directly produced by EarthLink, and the others are only slightly adjusted in layout. d, Differentiated task scorecard. The system’s performance across evaluation tasks is summarized, highlighting relative strengths in planning, coding, and visualization.
  • ...and 11 more figures