AI for Science: March 2026 Week 13

Mar 23 – Mar 29, 2026 · 95 papers analyzed · 3 breakthroughs

Summary

Week of 2026-03-16 to 2026-03-20 (data indexed through Mar 20; no papers found for Mar 23-29 in index). 3 breakthroughs across AI4Physics and AI4Math: (1) 2603.20179 — Claude Code autonomously executes complete HEP analysis pipelines including event selection, systematics, and paper writing with zero human feedback; (2) 2603.19329 — Goedel-Code-Prover achieves hierarchical proof search in Lean 4 for program verification, proving complex specs via structured lemma decomposition; (3) 2603.16321 — counterfactual causal mediation framework rigorously isolates genuine quantum contributions in QML circuits. Also notable: Garnet (2603.16770) trains a universal neural force field from scratch; HorizonMath (2603.15617) proposes 100+ unsolved math problems as AI research benchmarks; AI Scientist (2603.17216) trains science agents via synthetic task scaling.

Key Takeaway

The week's signal is clear: AI agents are crossing the threshold from assistance to autonomy in scientific workflows, with HEP analysis, protein design, and formal verification all seeing end-to-end automation — raising the question of what graduate-level science work still requires a human.

Breakthroughs (3)

1. AI Agents Can Already Autonomously Perform Experimental High Energy Physics

Why Novel: Prior work automated fragments of HEP workflows; this is the first end-to-end demonstration of full experimental physics analyses executed autonomously, including the production of standalone appendix papers written entirely by the agent. The result raises the question of whether graduate-level experimental physics is now automatable.

Key Innovations:

[object Object]
[object Object]
[object Object]

Evidence:

— undefined
— undefined
— undefined
— undefined
— undefined

Impact: Demonstrates that end-to-end experimental physics — a domain considered a bastion of human scientific judgment — is already automatable, with direct implications for accelerating discovery in data-rich sciences.

2. Goedel-Code-Prover: Hierarchical Proof Search for Open State-of-the-Art Code Verification

Why Novel: Prior LLM-based code verification attempts flat proof generation, which fails on complex programs. Goedel-Code-Prover introduces hierarchical decomposition into lemma subgoals, matching the structure of human expert proofs and enabling verification of programs that previously resisted automation.

Key Innovations:

[object Object]
[object Object]
[object Object]

Evidence:

— undefined
— undefined
— undefined

Impact: Advances formal verification toward practical deployment in software engineering pipelines, with direct implications for AI-assisted trustworthy code generation.

3. How Quantum Circuits Actually Learn: A Causal Identification of Genuine Quantum Contributions

Why Novel: The central challenge in QML is distinguishing quantum from classical contributions to performance. This paper provides the first causal identification methodology grounded in counterfactual reasoning, enabling principled attribution rather than heuristic comparisons. Validated across multiple circuit architectures with 29 mathematical environments.

Key Innovations:

[object Object]
[object Object]

Evidence:

— undefined
— undefined
— undefined

Impact: Provides a principled methodology for evaluating quantum advantage claims, which is critical for the field's credibility and for guiding circuit design toward genuinely quantum mechanisms.

Trends

Agentic science is maturing from demos to full pipelines: JFC (HEP), Agent Rosetta (proteins), and AI Scientist all demonstrate end-to-end autonomous workflows, not just tool-assisted fragments.
Formal verification is converging with LLM code generation: Goedel-Code-Prover, Stepwise, and several benchmarks target machine-checkable proof production as a near-term capability.
Universal neural force fields are becoming competitive: Garnet joins MACE-OFF23, Espaloma in the race to replace hand-crafted force fields with learnable, transferable models.
Interpretable ML for physics is gaining traction: KANs (nuclear masses), MAL (physical law identification), and causal QML analysis all prioritize explainability alongside accuracy.
QML credibility crisis addressed: new causal frameworks (2603.16321) are emerging to separate genuine quantum contributions from classical architectural scaling.

Notable Papers (7)

1. Training a force field for proteins and small molecules from scratch

Garnet, a graph neural network force field with continuous atom typing, trained end-to-end on QM, condensed phase, and protein data — achieving competitive accuracy vs. Espaloma and OpenFF on diverse molecular benchmarks.

2. AI Scientist via Synthetic Task Scaling

Trains autonomous research agents via synthetic task scaling, generating structured training data to improve ML research idea quality beyond what current LLMs produce.

3. HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification

Introduces a benchmark of 100+ predominantly unsolved problems across 8 mathematical domains with automatic verification, measuring AI's capacity for genuine mathematical research (not just competition problems).

4. Stepwise: Neuro-Symbolic Proof Search for Automated Systems Verification

Integrates LLMs with neuro-symbolic proof search for interactive theorem proving in system verification, reducing manual proof effort via stepwise LLM guidance.

5. Minimum-Action Learning: Energy-Constrained Symbolic Model Selection for Physical Law Identification from Noisy Data

Selects symbolic force laws from noisy observational data by minimizing a Triple-Action functional combining trajectory reconstruction, sparsity, and energy conservation — enabling recovery of physical laws at SNR as low as 1.6.

6. Protein Design with Agent Rosetta: A Case Study for Specialized Scientific Agents

Demonstrates an LLM agent controlling the Rosetta protein design suite for generalist protein engineering tasks beyond canonical amino acids.

7. Bridging Theory and Data: Correcting Nuclear Mass Models with Interpretable Machine Learning

Applies Kolmogorov-Arnold Networks (KANs) to correct nuclear mass model residuals, improving prediction accuracy while maintaining interpretability of corrections.

Honorable Mentions

Embodied Science: Closing the Discovery Loop with Agentic Embodied AI ()
DeePAW: A universal machine learning model for orbital-free ab initio calculations ()
Accelerating Structure-Property Relationship Discovery with Multimodal Machine Learning and Self-Driving Microscopy ()
The Agentic Researcher: A Practical Guide to AI-Assisted Research in Mathematics and Machine Learning ()
Reconstructing the Type Ia Supernova Absolute Magnitude with Two-Probe Physics-Informed Neural Networks ()