Accelerating Earth Science Discovery via Multi-Agent LLM Systems
Dmitrii Pantiukhin, Boris Shapkin, Ivan Kuznetsov, Antonia Anna Jost, Nikolay Koldunov
TL;DR
The paper identifies the fragmentation and scale of geoscience data archives as a barrier to reuse and discovery, largely due to metadata and format heterogeneity. It proposes multi-agent LLM systems, exemplified by PANGAEA GPT, to orchestrate domain-specific agents that access tools, perform analysis, and produce validated outputs through RAG and domain validators. The work outlines an architectural blueprint for centralized supervisor-driven MAS with sandboxed tools, memory management, and validation modules, and discusses both practical challenges (benchmarking and QA) and future directions. The approach has the potential to improve data accessibility, foster cross-disciplinary collaboration, and accelerate geoscientific discoveries, while enabling revitalization of historical datasets and more efficient expedition planning.
Abstract
This Perspective explores the transformative potential of Multi-Agent Systems (MAS) powered by Large Language Models (LLMs) in the geosciences. Users of geoscientific data repositories face challenges due to the complexity and diversity of data formats, inconsistent metadata practices, and a considerable number of unprocessed datasets. MAS possesses transformative potential for improving scientists' interaction with geoscientific data by enabling intelligent data processing, natural language interfaces, and collaborative problem-solving capabilities. We illustrate this approach with "PANGAEA GPT", a specialized MAS pipeline integrated with the diverse PANGAEA database for Earth and Environmental Science, demonstrating how MAS-driven workflows can effectively manage complex datasets and accelerate scientific discovery. We discuss how MAS can address current data challenges in geosciences, highlight advancements in other scientific fields, and propose future directions for integrating MAS into geoscientific data processing pipelines. In this Perspective, we show how MAS can fundamentally improve data accessibility, promote cross-disciplinary collaboration, and accelerate geoscientific discoveries.
