Table of Contents
Fetching ...

Designing Any Imaging System from Natural Language: Agent-Constrained Composition over a Finite Primitive Basis

Chengshuai Yang

Abstract

Designing a computational imaging system -- selecting operators, setting parameters, validating consistency -- requires weeks of specialist effort per modality, creating an expertise bottleneck that excludes the broader scientific community from prototyping imaging instruments. We introduce spec.md, a structured specification format, and three autonomous agents -- Plan, Judge, and Execute -- that translate a one-sentence natural-language description into a validated forward model with bounded reconstruction error. A design-to-real error theorem decomposes total reconstruction error into five independently bounded terms, each linked to a corrective action. On 6 real-data modalities spanning all 5 carrier families, the automated pipeline matches expert-library quality (98.1 +/- 4.2%). Ten novel designs -- composing primitives into chains from 3D to 5D -- demonstrate compositional reach beyond any single-modality tool.

Designing Any Imaging System from Natural Language: Agent-Constrained Composition over a Finite Primitive Basis

Abstract

Designing a computational imaging system -- selecting operators, setting parameters, validating consistency -- requires weeks of specialist effort per modality, creating an expertise bottleneck that excludes the broader scientific community from prototyping imaging instruments. We introduce spec.md, a structured specification format, and three autonomous agents -- Plan, Judge, and Execute -- that translate a one-sentence natural-language description into a validated forward model with bounded reconstruction error. A design-to-real error theorem decomposes total reconstruction error into five independently bounded terms, each linked to a corrective action. On 6 real-data modalities spanning all 5 carrier families, the automated pipeline matches expert-library quality (98.1 +/- 4.2%). Ten novel designs -- composing primitives into chains from 3D to 5D -- demonstrate compositional reach beyond any single-modality tool.

Paper Structure

This paper contains 41 sections, 8 theorems, 22 equations, 7 figures, 8 tables.

Key Result

Theorem 1

Let $A_{\text{true}}$ be the true imaging forward model and $A_{\text{agent}}$ the agent-designed model for a designable system (Definition def:scope). Let $\hat{x}$ be the regularized reconstruction from $A_{\text{agent}}$ and $x^*$ the true object. Under Assumptions A1--A4 (Methods), the reconstru where each design-error term is independently bounded and independently measurable:

Figures (7)

  • Figure 1: Specification-driven design pipeline. A natural-language prompt is the sole user input. $\mathcal{A}_P$ generates a spec.md; $\mathcal{A}_J$ validates in three sequential stages: (1) structural compilation (6 checks C1--C6), (2) Triad gate evaluation paperII ($G_1$--$G_3$), (3) cost and feasibility assessment. Two feedback paths handle failures: an automatic inner loop ($\mathcal{A}_P \leftrightarrow \mathcal{A}_J$, up to 3 rounds, dashed red) for structural/parametric corrections, and an outer loop (dashed blue) returning diagnostics to the user when automatic repair is exhausted. $\mathcal{A}_E$ executes reconstruction and outputs $\hat{A}$ with an explicit $\varepsilon$-bound report.
  • Figure 2: The spec.md specification format. Eight mandatory fields fully determine an imaging system design. Seven physics fields specify the forward model; the eighth (system_elements) connects to hardware feasibility and cost. Three examples span X-ray (CT), optical (CASSI), and spin (MRI) carriers. Each forward_model field is a primitive chain over the 11 FPB operators.
  • Figure 3: Error decomposition by modality. Stacked bars show the relative contribution of each Theorem \ref{['thm:design']} error term for 6 representative modalities. For well-conditioned systems (CT, MRI, e-ptychography), parameter mismatch $\varepsilon_{\text{param}}$ dominates and is reducible by calibration. For scattering-dominated systems (DOT), unmodeled physics $\varepsilon_{\text{unmod}}$ dominates and requires tier-lifting. $\varepsilon_{\text{FPB}} < 1\%$ in all cases. $\varepsilon_{\text{spec}} = \varepsilon_{\text{trans}} = 0$ for 31/36 modalities; CASSI shows nonzero $\varepsilon_{\text{trans}}$ due to dispersion--accumulation chain ambiguity.
  • Figure 4: Design-to-real validation.(a) A natural-language prompt is translated into a structured spec.md by the Plan Agent. (b) The Judge Agent validates all gates and checks (3 Triad gates + 6 compiler checks), producing a certificate. (c) Reconstruction results on three real-data modalities with ground truth: CT ($24.8$ dB, FISTA-TV on LoDoPaB), MRI ($31.7$ dB, HybridCascade++ on M4Raw), and CASSI ($24.3$ dB, GAP-TV on KAIST TSA meng2020gap). (d) Quality ratio (agent/expert PSNR) across 6 real-data modalities spanning all 5 carrier families: mean $98.1 \pm 4.2$%; theorem tightness ratio $\tau \in [1.8, 5.2]$, median 2.9.
  • Figure 5: Validation pyramid. Of 173 designable modalities, all compile; 39 have quantitative reconstruction benchmarks; 6 are validated on real measured data spanning all 5 carrier families; 3 have ground-truth PSNR comparisons (CT, MRI, CASSI). Each tier is a strict subset.
  • ...and 2 more figures

Theorems & Definitions (9)

  • Theorem 1: Design-to-Real Error Decomposition
  • Definition 2: Designable Imaging System
  • Proposition 3: Tier-Lifting
  • Theorem 4: Finite Primitive Basis paperII
  • Theorem 5: Triad Decomposition paperII
  • Theorem 6: Compression Bound paperII
  • Theorem 7: Calibration Sensitivity paperII
  • Theorem S8: Finite Primitive Basis
  • Theorem S9: Triad Decomposition