Table of Contents
Fetching ...

AI-Assisted Scientific Assessment: A Case Study on Climate Change

Christian Buck, Levke Caesar, Michelle Chen Huebscher, Massimiliano Ciaramita, Erich M. Fischer, Zeke Hausfather, Özge Kart Tokmak, Reto Knutti, Markus Leippold, Joseph Ludescher, Katharine J. Mach, Sofia Palazzo Corner, Kasra Rafiezadeh Shahi, Johan Rockström, Joeri Rogelj, Boris Sakschewski

TL;DR

This work evaluates a Gemini-based AI assistant embedded in a climate-science workflow to assess AMOC stability, a verification-poor problem requiring consensus-driven knowledge. The study shows that AI can accelerate drafting and improve presentation while keeping reasoning coherent, but substantial expert oversight is essential for rigor; humans ultimately authored the majority of the final content. Across five weeks and 13 scientists, AI contributed to revisions and sourcing, with about 58% of final content produced by humans and roughly 42% influenced by AI, indicating a productive human-AI collaboration rather than replacement. The results highlight the potential of hybrid intelligence for scalable, traceable, and rigorous scientific assessment and point to future directions for trusted full-stack AI co-scientists in climate risk assessment.

Abstract

The emerging paradigm of AI co-scientists focuses on tasks characterized by repeatable verification, where agents explore search spaces in 'guess and check' loops. This paradigm does not extend to problems where repeated evaluation is impossible and ground truth is established by the consensus synthesis of theory and existing evidence. We evaluate a Gemini-based AI environment designed to support collaborative scientific assessment, integrated into a standard scientific workflow. In collaboration with a diverse group of 13 scientists working in the field of climate science, we tested the system on a complex topic: the stability of the Atlantic Meridional Overturning Circulation (AMOC). Our results show that AI can accelerate the scientific workflow. The group produced a comprehensive synthesis of 79 papers through 104 revision cycles in just over 46 person-hours. AI contribution was significant: most AI-generated content was retained in the report. AI also helped maintain logical consistency and presentation quality. However, expert additions were crucial to ensure its acceptability: less than half of the report was produced by AI. Furthermore, substantial oversight was required to expand and elevate the content to rigorous scientific standards.

AI-Assisted Scientific Assessment: A Case Study on Climate Change

TL;DR

This work evaluates a Gemini-based AI assistant embedded in a climate-science workflow to assess AMOC stability, a verification-poor problem requiring consensus-driven knowledge. The study shows that AI can accelerate drafting and improve presentation while keeping reasoning coherent, but substantial expert oversight is essential for rigor; humans ultimately authored the majority of the final content. Across five weeks and 13 scientists, AI contributed to revisions and sourcing, with about 58% of final content produced by humans and roughly 42% influenced by AI, indicating a productive human-AI collaboration rather than replacement. The results highlight the potential of hybrid intelligence for scalable, traceable, and rigorous scientific assessment and point to future directions for trusted full-stack AI co-scientists in climate risk assessment.

Abstract

The emerging paradigm of AI co-scientists focuses on tasks characterized by repeatable verification, where agents explore search spaces in 'guess and check' loops. This paradigm does not extend to problems where repeated evaluation is impossible and ground truth is established by the consensus synthesis of theory and existing evidence. We evaluate a Gemini-based AI environment designed to support collaborative scientific assessment, integrated into a standard scientific workflow. In collaboration with a diverse group of 13 scientists working in the field of climate science, we tested the system on a complex topic: the stability of the Atlantic Meridional Overturning Circulation (AMOC). Our results show that AI can accelerate the scientific workflow. The group produced a comprehensive synthesis of 79 papers through 104 revision cycles in just over 46 person-hours. AI contribution was significant: most AI-generated content was retained in the report. AI also helped maintain logical consistency and presentation quality. However, expert additions were crucial to ensure its acceptability: less than half of the report was produced by AI. Furthermore, substantial oversight was required to expand and elevate the content to rigorous scientific standards.
Paper Structure (77 sections, 5 equations, 7 figures, 10 tables)

This paper contains 77 sections, 5 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: The evolution of the AMOC report, visualized as a history flowViegas-et-al-2004, a method introduced to analyze the edit history of Wikipedia articles. We visualize only the main body, excluding references which are section-specific in Phase 2 then unified in Phase 3. Transitions between versions are represented as vertical bands. Edits are attributed via color-coded horizontal bands. The Assistant provided the first version (0). Visually, the process is a somewhat chaotic, rich weaving of AI and human contributions.
  • Figure 2: Document similarity scores over successive versions of the AMOC report, compared to the final version or step-wise.
  • Figure 3: Visualization of the results of the sentence alignment process for the AMOC report.
  • Figure 4: A screenshot of the Assistant user interface. The highlighted text triggers a pop-up message box that can be used to send feedback to co-authors, or to the Gemini assistant, to generate a trace of the content, to inspect how it originated and evolved over time, or to ask the Assistant about the content in general. The screenshot also highlights the 'Welcome Back!' message, which updates the users about what happened since their last session.
  • Figure 5: User Instruction (Before)
  • ...and 2 more figures