Table of Contents
Fetching ...

A Self-Evolving Agentic Framework for Metasurface Inverse Design

Yi Huang, Bowen Zheng, Yunxi Dong, Hong Tang, Huan Zhao, S. M. Rakibul Hasan Shawon, Hualiang Zhang

Abstract

Metasurface inverse design has become central to realizing complex optical functionality, yet translating target responses into executable, solver-compatible workflows still demands specialized expertise in computational electromagnetics and solver-specific software engineering. Recent large language models (LLMs) offer a complementary route to reducing this workflow-construction burden, but existing language-driven systems remain largely session-bounded and do not preserve reusable workflow knowledge across inverse-design tasks. We present an agentic framework for metasurface inverse design that addresses this limitation through context-level skill evolution. The framework couples a coding agent, evolving skill artifacts, and a deterministic evaluator grounded in physical simulation so that solver-specific strategies can be iteratively refined across tasks without modifying model weights or the underlying physics solver. We evaluate the framework on a benchmark spanning multiple metasurface inverse-design task types, with separate training-aligned and held-out task families. Evolved skills raise in-distribution task success from 38% to 74%, increase criteria pass fraction from 0.510 to 0.870, and reduce average attempts from 4.10 to 2.30. On held-out task families, binary success changes only marginally, but improvements in best margin together with shifts in error composition and agent behavior indicate partial transfer of workflow knowledge. These results suggest that the main value of skill evolution lies in accumulating reusable solver-specific expertise around reliable computational engines, thereby offering a practical path toward more autonomous and accessible metasurface inverse-design workflows.

A Self-Evolving Agentic Framework for Metasurface Inverse Design

Abstract

Metasurface inverse design has become central to realizing complex optical functionality, yet translating target responses into executable, solver-compatible workflows still demands specialized expertise in computational electromagnetics and solver-specific software engineering. Recent large language models (LLMs) offer a complementary route to reducing this workflow-construction burden, but existing language-driven systems remain largely session-bounded and do not preserve reusable workflow knowledge across inverse-design tasks. We present an agentic framework for metasurface inverse design that addresses this limitation through context-level skill evolution. The framework couples a coding agent, evolving skill artifacts, and a deterministic evaluator grounded in physical simulation so that solver-specific strategies can be iteratively refined across tasks without modifying model weights or the underlying physics solver. We evaluate the framework on a benchmark spanning multiple metasurface inverse-design task types, with separate training-aligned and held-out task families. Evolved skills raise in-distribution task success from 38% to 74%, increase criteria pass fraction from 0.510 to 0.870, and reduce average attempts from 4.10 to 2.30. On held-out task families, binary success changes only marginally, but improvements in best margin together with shifts in error composition and agent behavior indicate partial transfer of workflow knowledge. These results suggest that the main value of skill evolution lies in accumulating reusable solver-specific expertise around reliable computational engines, thereby offering a practical path toward more autonomous and accessible metasurface inverse-design workflows.

Paper Structure

This paper contains 17 sections, 14 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: Overview of the self-evolving agentic framework for metasurface inverse design.Left: Skill evolution panel showing four successive versions of the SKILL.md file as they are refined by the optimizer across iterations, with success goal (SG) rates indicating progressive improvement. The meta-agent analyzes rollout outcomes and evolves skills via agentic crossover. Center: Coding agent panel illustrating how the decoupled code-generation component consumes evolved skill files and produces a candidate optimization program using generic tools. The program parameterizes the design, defines a loss function, and invokes gradient-based optimization through TorchRDIThuangEigendecompositionfreeInverseDesign2024. The CodegenEval block shows a two-level Ralph-style retry structure. Within an outer round, inner attempts preserve local contextual feedback while revising the same candidate program, whereas outer rounds reset the coding-agent session and carry forward only the best prior candidate together with compact feedback. Red dashed arrows indicate the retry loop between code generation and evaluation. Right: Physics panel depicting the differentiable TorchRDIT pipeline, from the target spectrum $T_{\mathrm{target}}(\lambda)$ and metasurface geometry parameterization through gradient-based optimization ($\theta_{t+1} = \theta_t - \eta \nabla \mathcal{L}$) to the achieved spectrum evaluated against physical criteria (SG, CPF, BM).
  • Figure 2: Attempt-efficiency analysis before and after skill evolution. Panels (a)--(c) show IID behavior, and panels (d)--(f) show OOD behavior on the held-out test tasks. Pass@K captures how quickly tasks reach $=1$, the transition plots summarize task-level fail/pass changes between the starter-skill baseline and the post-training evaluation, and the box plots show total attempts per task.
  • Figure 3: Error-type composition across training rounds of skill evolution. Panel (a) shows IID and panel (b) shows OOD. Infrastructure failures, such as connection errors and timeouts, are excluded so that the figure reflects code-level errors more directly tied to skill quality.
  • Figure 4: Behavior and cost analysis across training rounds of skill evolution. Panel (a) shows the average tool-call composition per task across / train and validation splits, together with reference lines from the fixed starter-skill baseline evaluated on the test split. Panel (b) shows the corresponding cost per task across the same rounds and splits, with the same test-baseline reference lines.