Reinforcement Learning for Photonic Component Design

Donald Witt; Jeff Young; Lukas Chrostowski

Reinforcement Learning for Photonic Component Design

Donald Witt, Jeff Young, Lukas Chrostowski

TL;DR

The paper presents a fab-in-the-loop reinforcement learning framework that directly optimizes nanophotonic devices by incorporating real fabrication deviations into the learning loop, thereby overcoming the mismatch between simulations and manufactured geometries. A spectral predictor neural network guides a Deep Deterministic Policy Gradient agent to generate 12-parameter, parameterized grating coupler designs across wavelength bins, using measured data from fabricated chips to continually refine predictions. Applied to photonic crystal grating couplers on an air-clad SOI platform, the approach delivers a measured insertion loss of $3.24$ dB per coupler (vs $8.8$ dB for traditional designs) and broadband designs with $<10.2$ dB loss over a $150$ nm range, with most designs meeting stringent loss criteria. The method’s data efficiency, demonstrated by a single fabrication cycle generating 1250 designs and six iterative rounds, and its generalizability to other photonic components indicate a practical pathway to robust, fabrication-aware photonic design.

Abstract

We present a new fab-in-the-loop reinforcement learning algorithm for the design of nano-photonic components that accounts for the imperfections present in nanofabrication processes. As a demonstration of the potential of this technique, we apply it to the design of photonic crystal grating couplers fabricated on an air clad 220 nm silicon on insulator single etch platform. This fab-in-the-loop algorithm improves the insertion loss from 8.8 to 3.24 dB. The widest bandwidth designs produced using our fab-in-the-loop algorithm can cover a 150 nm bandwidth with less than 10.2 dB of loss at their lowest point.

Reinforcement Learning for Photonic Component Design

TL;DR

dB per coupler (vs

dB for traditional designs) and broadband designs with

dB loss over a

nm range, with most designs meeting stringent loss criteria. The method’s data efficiency, demonstrated by a single fabrication cycle generating 1250 designs and six iterative rounds, and its generalizability to other photonic components indicate a practical pathway to robust, fabrication-aware photonic design.

Abstract

Paper Structure (11 sections, 3 equations, 12 figures, 1 table)

This paper contains 11 sections, 3 equations, 12 figures, 1 table.

Introduction
The Parameterized Grating Coupler
The Reinforcement Learning Algorithm
Optimization
Results
General Applicability Considerations
Conclusion
Acknowledgments
Author Contributions
Data Availability
References

Figures (12)

Figure 1: A comparison between traditional device optimization techniques vs the fab-in-the-loop approach. a) In the traditional approach, the optimizer will produce a design based on simulation results. The user will then fabricate this design, measure it, and find the performance drastically different from that predicted by the simulation due to various fabrication effects. b) The user introduces a lithography model to correct for the process bias and smoothing, but this model will not account for other fabrication effects. c) In the fab-in-the-loop approach, the algorithm will automatically optimize the device to the fabrication process without additional user input based solely on the measured results.
Figure 2: A schematic of the parameterized grating coupler. The start of the grating, end of the grating, angle, hole radius and lattice constant are all adjustable. The horizontal apodization start and end are adjustable. The vertical apodization adjusts the hole radius, as you move out from the center line. The vertical apodization dividing point allows for two different values of this parameter. Not shown here is the hole diameter at which the lattice constant is adjusted instead of the hole radius. In total, there are 12 adjustable parameters.
Figure 3: The spectral predictor. This network takes the 12 parameters of the grating coupler design and the current process bias for 80, 100, 120, and 140 nm holes and produces a power estimate for one wavelength. One hundred and fifty copies of this network are used to produce a power spectrum consisting of 150 wavelength values between 1490 and 1640 nm.
Figure 4: A schematic of our fab-in-the-loop RL algorithm. a) The feedback from measurement data. The measurement data are used to both train the spectral predictor and update the current best design. b) The DDPG portion of the algorithm that produces new designs. A DDPG episode is started with the current best design. Then, the DDPG algorithm produces a new set of design parameters. These parameters are then scored using the scoring algorithm described in Eqs. \ref{['inrange_score']}-\ref{['outofrange_score_final']}. This score is combined with the parameters and fed back into the DDPG algorithm until the end of the training episode. This is repeated 10 000 times for each of the required wavelength ranges.
Figure 5: An example of the output from the spectral predictor in red compared with a measured spectrum in blue. It can be seen that they match well in this case.
...and 7 more figures

Reinforcement Learning for Photonic Component Design

TL;DR

Abstract

Reinforcement Learning for Photonic Component Design

Authors

TL;DR

Abstract

Table of Contents

Figures (12)