Table of Contents
Fetching ...

Chat to Chip: Large Language Model Based Design of Arbitrarily Shaped Metasurfaces

Huanshu Zhang, Lei Kang, Sawyer D. Campbell, Douglas H. Werner

TL;DR

The paper addresses the computational bottleneck in designing arbitrarily shaped metasurfaces by introducing a chat-to-chip workflow that fine-tunes a one-dimensional token-wise LLM (Meta-Llama-3.1-8B-Instruct) with LoRA on a dataset of geometry-spectrum pairs. It demonstrates that forward predictions of transmission spectra can be obtained with high fidelity and substantial speedups over full-wave solvers, while benchmarking across open-weight LLMs identifies practical, cost-efficient models for rapid prototyping. For inverse design, the approach exploits the stochasticity of LLMs to generate diverse unit-cell geometries that realize target spectra with very low error, outperforming traditional tandem networks. Collectively, the work offers a turnkey, code-free path to data-driven nanophotonics and links natural language prompts to electromagnetic modeling to accelerate metasurface exploration.

Abstract

Traditional metasurface design is limited by the computational cost of full-wave simulations, preventing thorough exploration of complex configurations. Data-driven approaches have emerged as a solution to this bottleneck, replacing costly simulations with rapid neural network evaluations and enabling near-instant design for meta-atoms. Despite advances, implementing a new optical function still requires building and training a task-specific network, along with exhaustive searches for suitable architectures and hyperparameters. Pre-trained large language models (LLMs), by contrast, sidestep this laborious process with a simple fine-tuning technique. However, applying LLMs to the design of nanophotonic devices, particularly for arbitrarily shaped metasurfaces, is still in its early stages; as such tasks often require graphical networks. Here, we show that an LLM, fed with descriptive inputs of arbitrarily shaped metasurface geometries, can learn the physical relationships needed for spectral prediction and inverse design. We further benchmarked a range of open-weight LLMs and identified relationships between accuracy and model size at the billion-parameter level. We demonstrated that 1-D token-wise LLMs provide a practical tool to designing 2-D arbitrarily shaped metasurfaces. Linking natural-language interaction to electromagnetic modelling, this "chat-to-chip" workflow represents a step toward more user-friendly data-driven nanophotonics.

Chat to Chip: Large Language Model Based Design of Arbitrarily Shaped Metasurfaces

TL;DR

The paper addresses the computational bottleneck in designing arbitrarily shaped metasurfaces by introducing a chat-to-chip workflow that fine-tunes a one-dimensional token-wise LLM (Meta-Llama-3.1-8B-Instruct) with LoRA on a dataset of geometry-spectrum pairs. It demonstrates that forward predictions of transmission spectra can be obtained with high fidelity and substantial speedups over full-wave solvers, while benchmarking across open-weight LLMs identifies practical, cost-efficient models for rapid prototyping. For inverse design, the approach exploits the stochasticity of LLMs to generate diverse unit-cell geometries that realize target spectra with very low error, outperforming traditional tandem networks. Collectively, the work offers a turnkey, code-free path to data-driven nanophotonics and links natural language prompts to electromagnetic modeling to accelerate metasurface exploration.

Abstract

Traditional metasurface design is limited by the computational cost of full-wave simulations, preventing thorough exploration of complex configurations. Data-driven approaches have emerged as a solution to this bottleneck, replacing costly simulations with rapid neural network evaluations and enabling near-instant design for meta-atoms. Despite advances, implementing a new optical function still requires building and training a task-specific network, along with exhaustive searches for suitable architectures and hyperparameters. Pre-trained large language models (LLMs), by contrast, sidestep this laborious process with a simple fine-tuning technique. However, applying LLMs to the design of nanophotonic devices, particularly for arbitrarily shaped metasurfaces, is still in its early stages; as such tasks often require graphical networks. Here, we show that an LLM, fed with descriptive inputs of arbitrarily shaped metasurface geometries, can learn the physical relationships needed for spectral prediction and inverse design. We further benchmarked a range of open-weight LLMs and identified relationships between accuracy and model size at the billion-parameter level. We demonstrated that 1-D token-wise LLMs provide a practical tool to designing 2-D arbitrarily shaped metasurfaces. Linking natural-language interaction to electromagnetic modelling, this "chat-to-chip" workflow represents a step toward more user-friendly data-driven nanophotonics.

Paper Structure

This paper contains 12 sections, 4 figures.

Figures (4)

  • Figure 1: Mapping arbitrarily shaped metasurface geometries to language sequences and training an LLM for rapid optical prediction. (a) A $4 \times 4$ matrix of a randomly sampled grid of control-points is replicated by four-fold rotational symmetry, interpolated into a $256 \times 256$ scalar field, binarized at a fixed threshold of 0.5, and regularised by iterative morphological opening/closing that removes isolated features smaller than 8,192 pixels and seals internal voids. The resulting binary mask is then extruded into a 200 nm-thick silicon layer on a 1 µm-pitch glass substrate and analysed with FDTD, establishing paired grid-spectrum data. (b) Fine-tuning and inference process for forward prediction. Each grid-spectrum pair is rewritten as a natural-language prompt that encodes the control-point grid and a target output that lists the 31 transmission values between 1,050 nm and 1,600 nm. Moreover, parameter-efficient fine-tuning (LoRA) of a pre-trained LLM minimises cross-entropy between predicted and ground-truth tokens, so that at inference the model returns an accurate spectrum within seconds from a single grid prompt, eliminating the need for labour-intensive network design.
  • Figure 2: Predicted and simulated transmission spectra for four grids from the test set. The corresponding control-point grids and MSE are: (a) [[0.411, 0.795, 0.126, 0.233], [0.876, 0.187, 0.209, 0.911], [0.318, 0.479, 0.998, 0.826], [0.555, 0.820, 0.238, 0.058]], MSE = $7.8 \times 10^{-6}$. (b) [[0.156, 0.485, 0.350, 0.248], [0.391, 0.476, 0.083, 0.444], [0.041, 0.419, 0.524, 0.511], [0.695, 0.026, 0.690, 0.560]], MSE = $2.6 \times 10^{-6}$. (c) [[0.203, 0.155, 0.608, 0.655], [0.682, 0.541, 0.924, 0.898], [0.660, 0.610, 0.193, 0.065], [0.145, 0.508, 0.538, 0.098]], MSE = $3.6 \times 10^{-6}$. (d) [[0.049, 0.881, 0.405, 0.843], [0.288, 0.836, 0.375, 0.149], [0.736, 0.211, 0.728, 0.012], [0.471, 0.181, 0.914, 0.007]], MSE = $4.1 \times 10^{-6}$.
  • Figure 3: (a) Test-set MSE for Llama-3.1-8B versus fine-tuning epochs. Although the MSE exceeds the $5 \times 10^{-3}$ tolerance line (red dashed line) during epochs 1-4 (grey curve), once fine-tuning reaches epoch 5 the orange curve remains consistently below this tolerance and only marginally above the $2.0 \times 10^{-3}$ benchmark reached by the best hand‑tuned eight‑layer DNNs (blue dashed line), indicating that predictive accuracy is largely insensitive to training length within a certain range. (b-d) MSE after eight-epoch LoRA fine-tuning for open-weight models grouped by size: (b) mid-size checkpoints (7-9B parameters). DS-Llama-8B stands for DeepSeek-distilled Llama-3.1-8B; (c) large models (> 9B); (d) small models (< 7B).
  • Figure 4: (a) Workflow of the inverse-design stage. A target 31-point transmission spectrum is fed to the fine-tuned Llama-3.1-8B as a natural-language query of a corresponding grid; the model autoregressively returns a control-point grid that defines a candidate meta-atom. (b) Representative results for four unseen targets. The orange dashed lines are FDTD simulated results of inverse-designed metasurfaces. The corresponding inverse-designed grids and MSE are: top-left: [[0.550, 0.073, 0.906, 0.559], [0.324, 0.326, 0.831, 0.708], [0.916, 0.060, 0.517, 0.120], [0.023, 0, 0.249, 0.263]], MSE = $2.0 \times 10^{-7}$; top-right: [[0.360, 0.903, 0.903, 0.822], [0.419, 0.386, 0.377, 0.962], [0.744, 0.397, 0.391, 0.742], [0.890, 0.048, 0.259, 0.686]], MSE = $1.2 \times 10^{-6}$; bottom-left: [0.460, 0.289, 0.513, 0.473], [0.199, 0.641, 0.932, 0.866], [0.757, 0.956, 0.755, 0.282], [0.9120, 0.571, 0.547, 0.876]], MSE = $1.4 \times 10^{-6}$; bottom-right: [[0.964, 0.207, 0.656, 0.287], [0.777, 0.548, 0.192, 0.460], [0.181, 0.202, 0.218, 0.812], [0.303, 0.866, 0.496, 0.582]], MSE = $3.0 \times 10^{-7}$. The histogram within the top-left figure depicts the inverse-design test-set MSE distribution, showing that over 88% of samples achieve an MSE below $1.0 \times 10^{-2}$.