AI-Assisted Scientific Assessment: A Case Study on Climate Change
Christian Buck, Levke Caesar, Michelle Chen Huebscher, Massimiliano Ciaramita, Erich M. Fischer, Zeke Hausfather, Özge Kart Tokmak, Reto Knutti, Markus Leippold, Joseph Ludescher, Katharine J. Mach, Sofia Palazzo Corner, Kasra Rafiezadeh Shahi, Johan Rockström, Joeri Rogelj, Boris Sakschewski
TL;DR
This work evaluates a Gemini-based AI assistant embedded in a climate-science workflow to assess AMOC stability, a verification-poor problem requiring consensus-driven knowledge. The study shows that AI can accelerate drafting and improve presentation while keeping reasoning coherent, but substantial expert oversight is essential for rigor; humans ultimately authored the majority of the final content. Across five weeks and 13 scientists, AI contributed to revisions and sourcing, with about 58% of final content produced by humans and roughly 42% influenced by AI, indicating a productive human-AI collaboration rather than replacement. The results highlight the potential of hybrid intelligence for scalable, traceable, and rigorous scientific assessment and point to future directions for trusted full-stack AI co-scientists in climate risk assessment.
Abstract
The emerging paradigm of AI co-scientists focuses on tasks characterized by repeatable verification, where agents explore search spaces in 'guess and check' loops. This paradigm does not extend to problems where repeated evaluation is impossible and ground truth is established by the consensus synthesis of theory and existing evidence. We evaluate a Gemini-based AI environment designed to support collaborative scientific assessment, integrated into a standard scientific workflow. In collaboration with a diverse group of 13 scientists working in the field of climate science, we tested the system on a complex topic: the stability of the Atlantic Meridional Overturning Circulation (AMOC). Our results show that AI can accelerate the scientific workflow. The group produced a comprehensive synthesis of 79 papers through 104 revision cycles in just over 46 person-hours. AI contribution was significant: most AI-generated content was retained in the report. AI also helped maintain logical consistency and presentation quality. However, expert additions were crucial to ensure its acceptability: less than half of the report was produced by AI. Furthermore, substantial oversight was required to expand and elevate the content to rigorous scientific standards.
