Chain of Unit-Physics: A Primitive-Centric Approach to Scientific Code Synthesis
Vansh Sharma, Venkat Raman
TL;DR
The paper tackles the reliability gap in agentic large-language-model–driven scientific code by proposing Chain of Unit-Physics, a primitives-centric, test-driven framework that embeds first-principles constraints into a multi-agent code-generation workflow. By formalizing unit-physics primitives and employing a supervisor–diagnostic–verification loop, the approach guides synthesis toward physically consistent solvers, demonstrated on a combustion benchmark. Closed-weight analyses reveal widespread failure modes, while open-weight setups improve but do not yet reach reliable end-to-end solutions without the unit-physics discipline. The Chain of Unit-Physics system converges in 5–6 iterations, matching human-expert results with significantly better efficiency (≈33% faster runtime, ≈30% lower memory) and a mean L2 error of $3.1\times10^{-3}$%, establishing a practical template for physics-grounded code generation. As models evolve, embedding first-principles checks offers robustness beyond raw training data, promising more trustworthy scientific software from natural-language queries.
Abstract
Agentic large language models are proposed as autonomous code generators for scientific computing, yet their reliability in high-stakes problems remains unclear. Developing computational scientific software from natural-language queries remains challenging broadly due to (a) sparse representation of domain codes during training and (b) the limited feasibility of RLHF with a small expert community. To address these limitations, this work conceptualizes an inverse approach to code design, embodied in the Chain of Unit-Physics framework: a first-principles (or primitives)-centric, multi-agent system in which human expert knowledge is encoded as unit-physics tests that explicitly constrain code generation. The framework is evaluated on a nontrivial combustion task, used here as a representative benchmark for scientific problem with realistic physical constraints. Closed-weight systems and code-focused agentic variants fail to produce correct end-to-end solvers, despite tool and web access, exhibiting four recurrent error classes: interface (syntax/API) hallucinations, overconfident assumptions, numerical/physical incoherence, and configuration fragility. Open-weight models with chain-of-thought (CoT) decoding reduce interface errors but still yield incorrect solutions. On the benchmark task, the proposed framework converges within 5-6 iterations, matches the human-expert implementation (mean error of $3.1\times10^{-3}$ %), with a $\sim$33.4 % faster runtime and a $\sim$30 % efficient memory usage at a cost comparable to mid-sized commercial APIs, yielding a practical template for physics-grounded scientific code generation. As datasets and models evolve, zero-shot code accuracy will improve; however, the Chain of Unit-Physics framework goes further by embedding first-principles analysis that is foundational to scientific codes.
