Table of Contents
Fetching ...

Exploring Robust Multi-Agent Workflows for Environmental Data Management

Boyuan Guan, Jason Liu, Yanzhao Wu, Kiavash Bahreini

Abstract

Embedding LLM-driven agents into environmental FAIR data management is compelling - they can externalize operational knowledge and scale curation across heterogeneous data and evolving conventions. However, replacing deterministic components with probabilistic workflows changes the failure mode: LLM pipelines may generate plausible but incorrect outputs that pass superficial checks and propagate into irreversible actions such as DOI minting and public release. We introduce EnviSmart, a production data management system deployed on campus-wide storage infrastructure for environmental research. EnviSmart treats reliability as an architectural property through two mechanisms: a three-track knowledge architecture that externalizes behaviors (governance constraints), domain knowledge (retrievable context), and skills (tool-using procedures) as persistent, interlocking artifacts; and a role-separated multi-agent design where deterministic validators and audited handoffs restore fail-stop semantics at trust boundaries before irreversible steps. We compare two production deployments. The University's GIS Center Ecological Archive (849 curated datasets) serves as a single-agent baseline. SF2Bench, a compound flooding benchmark comprising 2,452 monitoring stations and 8,557 published files spanning 39 years, validates the multi-agent workflow. The multi-agent approach improved both efficiency - completed by a single operator in two days with repeated artifact reuse across deployments - and reliability: audited handoffs detected and blocked a coordinate transformation error affecting all 2,452 stations before publication. A representative incident (ISS-004) demonstrated boundary-based containment with 10-minute detection latency, zero user exposure, and 80-minute resolution. This paper has been accepted at PEARC 2026.

Exploring Robust Multi-Agent Workflows for Environmental Data Management

Abstract

Embedding LLM-driven agents into environmental FAIR data management is compelling - they can externalize operational knowledge and scale curation across heterogeneous data and evolving conventions. However, replacing deterministic components with probabilistic workflows changes the failure mode: LLM pipelines may generate plausible but incorrect outputs that pass superficial checks and propagate into irreversible actions such as DOI minting and public release. We introduce EnviSmart, a production data management system deployed on campus-wide storage infrastructure for environmental research. EnviSmart treats reliability as an architectural property through two mechanisms: a three-track knowledge architecture that externalizes behaviors (governance constraints), domain knowledge (retrievable context), and skills (tool-using procedures) as persistent, interlocking artifacts; and a role-separated multi-agent design where deterministic validators and audited handoffs restore fail-stop semantics at trust boundaries before irreversible steps. We compare two production deployments. The University's GIS Center Ecological Archive (849 curated datasets) serves as a single-agent baseline. SF2Bench, a compound flooding benchmark comprising 2,452 monitoring stations and 8,557 published files spanning 39 years, validates the multi-agent workflow. The multi-agent approach improved both efficiency - completed by a single operator in two days with repeated artifact reuse across deployments - and reliability: audited handoffs detected and blocked a coordinate transformation error affecting all 2,452 stations before publication. A representative incident (ISS-004) demonstrated boundary-based containment with 10-minute detection latency, zero user exposure, and 80-minute resolution. This paper has been accepted at PEARC 2026.

Paper Structure

This paper contains 13 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The "fail-open" problem under composition. LLM agents can succeed in bounded, reversible tasks (left), but reliability degrades across end-to-end publication workflows with irreversible steps (center). Core LLM limitations compound under composition (right), yielding multiplicative reliability decay $p^n$ (bottom). At production scale, silent failures create irreversible contamination and trust collapse.
  • Figure 2: System design overview. Top: Three-Track artifact store externalizing governance (Track 1), semantic context (Track 2), and executable procedures (Track 3) that interlock at execution time. Bottom: multi-agent operating model with role-separated agents (data preparation, publishing, platform operations) connected by audited handoffs (H1/H2) with deterministic validators before state-changing publication.
  • Figure 3: Three-Track artifact graph and an enforceable "do-not-guess" gate. Left: the full artifact graph with typed nodes for Behaviors (Track 1), Knowledge (Track 2), and Skills (Track 3) used during production. Right: zoomed interlock between a metadata-creation workflow (Track 3) and an authoritative mapping source behavior (Track 1). The behavior enforces that when filenames do not align, the workflow must obtain an authoritative mapping rather than guessing.
  • Figure 4: Audited handoff protocol. Each agent-to-agent transition follows four phases: prepare (package outputs with provenance), validate (deterministic gates), approve (record and escalate on failure), and commit (apply state changes). Failures block downstream execution. Inset: production incident ISS-004, from boundary detection to verified resolution.