Table of Contents
Fetching ...

ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms

Juan Diego Toscano, Daniel T. Chen, George Em Karniadakis

TL;DR

This work presents ATHENA, an agentic lab that unifies Scientific Computing and Scientific Machine Learning through the HENA loop, recasting research as a Contextual Bandit to achieve sample-efficient discovery. Conceptual Scaffolding constrainsthe action space to expert blueprints, enabling robust, verifiable methodological evolution via Agentic Teams and a proposer-critic policy. ATHENA demonstrates deep physical reasoning, automatic discovery of exact or highly accurate solvers, and effective human-in-the-loop interventions, including hybrid PINN–FEM workflows for multiphysics problems. The framework achieves state-of-the-art accuracy (e.g., $4.76\times 10^{-14}$ MSE in viscous Burgers) and exhibits strong collaborative capabilities, signaling a paradigm shift toward autonomous laboratories that accelerate scientific discovery while preserving mathematical rigor.

Abstract

Bridging the gap between theoretical conceptualization and computational implementation is a major bottleneck in Scientific Computing (SciC) and Scientific Machine Learning (SciML). We introduce ATHENA (Agentic Team for Hierarchical Evolutionary Numerical Algorithms), an agentic framework designed as an Autonomous Lab to manage the end-to-end computational research lifecycle. Its core is the HENA loop, a knowledge-driven diagnostic process framed as a Contextual Bandit problem. Acting as an online learner, the system analyzes prior trials to select structural `actions' ($A_n$) from combinatorial spaces guided by expert blueprints (e.g., Universal Approximation, Physics-Informed constraints). These actions are translated into executable code ($S_n$) to generate scientific rewards ($R_n$). ATHENA transcends standard automation: in SciC, it autonomously identifies mathematical symmetries for exact analytical solutions or derives stable numerical solvers where foundation models fail. In SciML, it performs deep diagnosis to tackle ill-posed formulations and combines hybrid symbolic-numeric workflows (e.g., coupling PINNs with FEM) to resolve multiphysics problems. The framework achieves super-human performance, reaching validation errors of $10^{-14}$. Furthermore, collaborative ``human-in-the-loop" intervention allows the system to bridge stability gaps, improving results by an order of magnitude. This paradigm shift focuses from implementation mechanics to methodological innovation, accelerating scientific discovery.

ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms

TL;DR

This work presents ATHENA, an agentic lab that unifies Scientific Computing and Scientific Machine Learning through the HENA loop, recasting research as a Contextual Bandit to achieve sample-efficient discovery. Conceptual Scaffolding constrainsthe action space to expert blueprints, enabling robust, verifiable methodological evolution via Agentic Teams and a proposer-critic policy. ATHENA demonstrates deep physical reasoning, automatic discovery of exact or highly accurate solvers, and effective human-in-the-loop interventions, including hybrid PINN–FEM workflows for multiphysics problems. The framework achieves state-of-the-art accuracy (e.g., MSE in viscous Burgers) and exhibits strong collaborative capabilities, signaling a paradigm shift toward autonomous laboratories that accelerate scientific discovery while preserving mathematical rigor.

Abstract

Bridging the gap between theoretical conceptualization and computational implementation is a major bottleneck in Scientific Computing (SciC) and Scientific Machine Learning (SciML). We introduce ATHENA (Agentic Team for Hierarchical Evolutionary Numerical Algorithms), an agentic framework designed as an Autonomous Lab to manage the end-to-end computational research lifecycle. Its core is the HENA loop, a knowledge-driven diagnostic process framed as a Contextual Bandit problem. Acting as an online learner, the system analyzes prior trials to select structural `actions' () from combinatorial spaces guided by expert blueprints (e.g., Universal Approximation, Physics-Informed constraints). These actions are translated into executable code () to generate scientific rewards (). ATHENA transcends standard automation: in SciC, it autonomously identifies mathematical symmetries for exact analytical solutions or derives stable numerical solvers where foundation models fail. In SciML, it performs deep diagnosis to tackle ill-posed formulations and combines hybrid symbolic-numeric workflows (e.g., coupling PINNs with FEM) to resolve multiphysics problems. The framework achieves super-human performance, reaching validation errors of . Furthermore, collaborative ``human-in-the-loop" intervention allows the system to bridge stability gaps, improving results by an order of magnitude. This paradigm shift focuses from implementation mechanics to methodological innovation, accelerating scientific discovery.

Paper Structure

This paper contains 46 sections, 30 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Online Learning as a Model for Agentic Research. This diagram illustrates the research lifecycle modeled as a Contextual Bandit problem. The cycle begins with the Study phase (Policy $\pi$), where the Strategist synthesizes the Problem context and prior Rewards ($R_n$) to formulate a structural Plan or Action ($A_n$). The Implementation phase (Operator $\mathcal{I}$) translates this abstract plan into executable Code or State ($S_n$). Following Execution, the raw Observables ($O_n$)—such as loss curves and solution fields—are processed by the Error Analysis (Advisor) agent. Crucially, this agent computes the Scientific Reward ($R_n$), closing the feedback loop and enabling the system to iteratively minimize regret in subsequent trials.
  • Figure 2: ATHENA (Agentic Team for Hierarchical Evolutionary Numerical Algorithms). The framework is organized into four logical groups, with specific icons indicating the heterogeneous allocation of Large Language Models (LLMs) to specialized roles (see legend). (A) The Conceptualization Group (Red): The user-facing triage system. The User interacts with a Coordinator to define a User Request, which the Gatekeeper routes to the appropriate team. (B) The 'Policy' Operator $\pi$ (Green): The "brain" of the HENA loop, composed of the Strategy Team and Advisor Team. This group analyzes the context $C_n$ to formulate the structural action $A_n$ and assigns the scientific reward $R_n$ based on the resulting observations. (C) The 'Implementation' Operator $\mathcal{I}$ (Blue): The "executor" teams (Code Retrieval, Implementation, Debugging) that build, patch ("cell-by-cell refactoring"), and debug the code to produce the executable state $S_n$. Crucially, this workflow includes an Inspector Agent, which strictly verifies that the implementation faithfully follows the desired plan $A_n$ before execution. (D) The Execution Block (Yellow): This component runs the observation function $E(O_n | S_n)$ on the state $S_n$ to generate the multi-modal observation $O_n$ (plots, logs). This observation $O_n$ is sent back to the Advisor Team (B), completing the evolutionary cycle. The user also has access to $A_n$, $S_n$, and $O_n$ at each iteration, enabling a transparent expert-in-the-loop workflow.
  • Figure 3: Comparative analysis of the 2D Inviscid Burgers' benchmark. The panels contrast the solution fields generated by state-of-the-art foundation models via direct prompting against the autonomous solution discovered by ATHENA. Baselines (Direct Prompting): Models such as GPT-5.1 and Claude Sonnet 4.5 incorrectly select Fourier Spectral methods and apply aggressive frequency filtering to handle the shock, resulting in "arrested motion" where the wave fails to propagate. Gemini 3.0 captures the dynamics but succumbs to significant instability and oscillations at the shock interface. ATHENA (Ours): By leveraging its agentic scaffolding, ATHENA diagnoses the problem's underlying symmetry and autonomously switches to the Method of Characteristics. This allows it to bypass numerical diffusion entirely and recover the Exact Solution (ground truth), validating the framework's ability to transcend naive numerical selection.
  • Figure 4: Autonomous correction of the Kelvin-Helmholtz Instability (Euler Equations) The figure tracks the evolution of the solution fields—Density ($\rho$) and Velocity ($v$)—as ATHENA refines the solver configuration. Rows 1-2 (Iteration 1): The initial simulation completes with exit code 0 but exhibits a "silent physics failure." The solution is dominated by numerical diffusion; the shear layer is smeared, and the vortices fail to form because the velocity-based AMR indicator cannot distinguish the interface. Rows 3-4 (Iteration 3): Following the Advisor Team's intervention—which switched the AMR trigger to density, increased the polynomial order, and tuned shock capturing—the system recovers the correct inviscid dynamics. Note the emergence of sharp, well-defined Kelvin-Helmholtz rolls and the characteristic "cat's eye" vortex structures, demonstrating the restoration of physical fidelity.
  • Figure 5: Rayleigh-Taylor Instability (Compressible Navier-Stokes). Time evolution of density ($\rho$), velocity ($u, v$), and pressure ($p$). Unlike baselines that failed due to geometric distortions or instability, ATHENA stabilized the simulation by autonomously diagnosing two critical constraints: (1) It reconfigured the mesh topology into a $1 \times 4$ quadtree forest to match the domain's 1:4 aspect ratio, preventing element stretching. (2) It analytically derived and enforced the exact hydrostatic pressure gradient ($\nabla p = \rho \mathbf{g}$) to balance the piecewise density. These interventions prevented spurious acoustic waves, allowing the complex non-linear mixing to develop stably.
  • ...and 6 more figures