High-performance real-world optical computing trained by in situ gradient-based model-free optimization

Guangyuan Zhao; Xin Shu; Renjie Zhou

High-performance real-world optical computing trained by in situ gradient-based model-free optimization

Guangyuan Zhao, Xin Shu, Renjie Zhou

TL;DR

This work proposes a model-free solution for lightweight in situ optimization of optical computing systems based on the score gradient estimation algorithm that treats the system as a black box and back-propagates loss directly to the optical weights' probabilistic distributions, hence circumventing the need for computation-heavy and biased system simulation.

Abstract

Optical computing systems provide high-speed and low-energy data processing but face deficiencies in computationally demanding training and simulation-to-reality gaps. We propose a gradient-based model-free optimization (G-MFO) method based on a Monte Carlo gradient estimation algorithm for computationally efficient in situ training of optical computing systems. This approach treats an optical computing system as a black box and back-propagates the loss directly to the optical computing weights' probability distributions, circumventing the need for a computationally heavy and biased system simulation. Our experiments on diffractive optical computing systems show that G-MFO outperforms hybrid training on the MNIST and FMNIST datasets. Furthermore, we demonstrate image-free and high-speed classification of cells from their marker-free phase maps. Our method's model-free and high-performance nature, combined with its low demand for computational resources, paves the way for accelerating the transition of optical computing from laboratory demonstrations to practical, real-world applications.

High-performance real-world optical computing trained by in situ gradient-based model-free optimization

TL;DR

Abstract

Paper Structure (44 sections, 17 equations, 14 figures, 10 tables, 4 algorithms)

This paper contains 44 sections, 17 equations, 14 figures, 10 tables, 4 algorithms.

Introduction
Related Work
In situ training strategies to optimize the optical computing system
Zeroth-Order Optimization and Its Applications to Computational Optics
Application side of optical computing
Methodology
Problem setup
In silico simulator-based training (SBT)
In situ G-MFO
Experiment and Simulation Details of Our Diffractive Optical Computing System
Results
General performance evaluation on the MNIST and FMNIST dataset
Simulation comparison between G-MFO and other zeroth-order optimization methods in simulation on a small dataset
Simulation results on the two-layer optical computing system $\hat{f}_{sys-2-layer}$.
G-MFO outperforms hybrid training (HBT) experimentally on a single-layer optical computing system $f_{sys-1-layer}$.
...and 29 more sections

Figures (14)

Figure 1: Gradient-based model-free optimization based training of the optical computing system. (a) The brown highlights show that our gradient-based model-free training strategy back-propagates the error of training to the distribution parameter $\theta$ and bypasses the reliance on correct differentiable modeling of the optical system $f_{sys}$ and knowledge on input $\{x_i\}_{i=1}^N$. (b) The blue highlights show that the conventional training of the optical computing system relies on a physics-based simulator $\hat{f}_{sys}$, which substitutes the inaccessible $f_{sys}$ corresponding to the real system. The training process back-propagates the loss through simulator $\hat{f}_{sys}$ to update the weight $w$. This is the basis of SBT and HBT methods.
Figure 2: System misalignment in a real optical computing system degenerates the performance of the optical computing system trained solely with a physics-based simulator. (a) Testing accuracy drops to $36.4\%$ from $82.2\%$ when having a misaligned rotation angle $\Delta \phi_I=0.01\degree$. (b) Testing accuracy drops to $51.0\%$ from $82.2\%$ when the x'-axis misalignment of the optical computing layer $\Delta d_c$ is $41.1 \: \mu m$. (c) Testing accuracy decreases to $50.9\%$ from $82.2\%$ when the y'-axis misalignment of the output layer $\Delta d_o$ is $62.4 \: \mu m$.
Figure 3: A visual illustration of the training process for the G-MFO experiment on the real system.
Figure 4: Visualization of the single-layer optical computing system's experimental outputs and confusion matrices trained with G-MFO. (a) An input phase object digit '$2$' from the MNIST dataset is modulated by the optical computing layer with weight $w$ trained using G-MFO. The system correctly predicts the input as digit ‘$2$’, as the output image has the largest intensity at the region corresponding to digit '$2$'. (b) Confusion matrix on the MNIST dataset with a training accuracy of $83.1\%$. (c) An example of a 'pullover' from the FMNIST dataset is correctly predicted. (d) Confusion matrix on the FMNIST dataset with a training accuracy of $74.0\%$.
Figure 5: G-MFO balances sample efficiency and task performance for the optimization process. We compare the training curves of the two-point zeroth-order (ZO) method (green), the full-point ZO method (blue), and our G-MFO (orange) method. The full-point ZO method and G-MFO achieve $99\%$ training accuracy on the small dataset. Notably, the total number of samples of G-MFO is only approximately $4\%$ of that required by the full-point ZO method to achieve an accuracy of $99\%$. Additionally, the two-point ZO method fails to optimize effectively. The right panel presents an enlarged view of the selected region from the left panel.
...and 9 more figures

High-performance real-world optical computing trained by in situ gradient-based model-free optimization

TL;DR

Abstract

High-performance real-world optical computing trained by in situ gradient-based model-free optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (14)