Table of Contents
Fetching ...

Agentic Scientific Simulation: Execution-Grounded Model Construction and Reconstruction

Knut-Andreas Lie, Olav Møyner, Elling Svee, Jakob Torben

TL;DR

JutulGPT is presented, a reference implementation built on the fully differentiable Julia-based reservoir simulator JutulDarcy, where model construction is organized as an execution-grounded interpret-act-validate loop and the simulator serves as the authoritative arbiter of physical validity rather than merely a runtime.

Abstract

LLM agents are increasingly used for code generation, but physics-based simulation poses a deeper challenge: natural-language descriptions of simulation models are inherently underspecified, and different admissible resolutions of implicit choices produce physically valid but scientifically distinct configurations. Without explicit detection and resolution of these ambiguities, neither the correctness of the result nor its reproducibility from the original description can be assured. This paper investigates agentic scientific simulation, where model construction is organized as an execution-grounded interpret-act-validate loop and the simulator serves as the authoritative arbiter of physical validity rather than merely a runtime. We present JutulGPT, a reference implementation built on the fully differentiable Julia-based reservoir simulator JutulDarcy. The agent combines structured retrieval of documentation and examples with code synthesis, static analysis, execution, and systematic interpretation of solver diagnostics. Underspecified modelling choices are detected explicitly and resolved either autonomously (with logged assumptions) or through targeted user queries. The results demonstrate that agent-mediated model construction can be grounded in simulator validation, while also revealing a structural limitation: choices resolved tacitly through simulator defaults are invisible to the assumption log and to any downstream representation. A secondary experiment with autonomous reconstruction of a reference model from progressively abstract textual descriptions shows that reconstruction variability exposes latent degrees of freedom in simulation descriptions and provides a practical methodology for auditing reproducibility. All code, prompts, and agent logs are publicly available.

Agentic Scientific Simulation: Execution-Grounded Model Construction and Reconstruction

TL;DR

JutulGPT is presented, a reference implementation built on the fully differentiable Julia-based reservoir simulator JutulDarcy, where model construction is organized as an execution-grounded interpret-act-validate loop and the simulator serves as the authoritative arbiter of physical validity rather than merely a runtime.

Abstract

LLM agents are increasingly used for code generation, but physics-based simulation poses a deeper challenge: natural-language descriptions of simulation models are inherently underspecified, and different admissible resolutions of implicit choices produce physically valid but scientifically distinct configurations. Without explicit detection and resolution of these ambiguities, neither the correctness of the result nor its reproducibility from the original description can be assured. This paper investigates agentic scientific simulation, where model construction is organized as an execution-grounded interpret-act-validate loop and the simulator serves as the authoritative arbiter of physical validity rather than merely a runtime. We present JutulGPT, a reference implementation built on the fully differentiable Julia-based reservoir simulator JutulDarcy. The agent combines structured retrieval of documentation and examples with code synthesis, static analysis, execution, and systematic interpretation of solver diagnostics. Underspecified modelling choices are detected explicitly and resolved either autonomously (with logged assumptions) or through targeted user queries. The results demonstrate that agent-mediated model construction can be grounded in simulator validation, while also revealing a structural limitation: choices resolved tacitly through simulator defaults are invisible to the assumption log and to any downstream representation. A secondary experiment with autonomous reconstruction of a reference model from progressively abstract textual descriptions shows that reconstruction variability exposes latent degrees of freedom in simulation descriptions and provides a practical methodology for auditing reproducibility. All code, prompts, and agent logs are publicly available.
Paper Structure (27 sections, 1 equation, 9 figures, 1 table)

This paper contains 27 sections, 1 equation, 9 figures, 1 table.

Figures (9)

  • Figure 1: Typical iterative interpret--act--validate loop used by JutulGPT. The agent incrementally interprets user intent, detects ambiguities, and resolves them either autonomously (with explicit assumption logging) or via targeted user queries. Code generation is grounded in retrieved documentation and validated through static analysis, execution and simulator diagnostics, with failures triggering revision cycles. The loop terminates when the simulator runs to completion; an event that constitutes a validity certificate by virtue of JutulDarcy's internal enforcement of conservation tolerances, closure consistency, and solver convergence.
  • Figure 2: Taxonomy of well models produced by JutulGPT in response to a documentation query.
  • Figure 3: Saturation profiles extracted at fixed fractions of injected pore volume (PVI), providing a normalized basis for comparison independent of injection rate or grid resolution.
  • Figure 4: Explanation generated by the agent when asked to provide a formal description of the simulated problem, including assumptions, governing equations, constitutive laws, introduction of fractional form and mobility ratio, physical explanation of favorable versus unfavorable flow, and brief discussion of the importance of the quarter-five-spot problem.
  • Figure 5: 3D reservoir model generated by JutulGPT. The front half uses semi-transparent colors to distinguish the three stratigraphic layers (each with ten grid layers of different average permeability and porosity), while the back half shows log$_10$ of permeability to reveal the stochastic heterogeneity correctly produced within each layer. Four corner injectors (red, I1--I4) and three interior producers (blue, P1--P3) are placed for a peripheral waterflood. In this classic setup, water is injected to displace more viscous oil toward the producers; the anticlinal trap and layered heterogeneity together control sweep efficiency.
  • ...and 4 more figures