Spatially Varying Nanophotonic Neural Networks
Kaixuan Wei, Xiao Li, Johannes Froech, Praneeth Chakravarthula, James Whitehead, Ethan Tseng, Arka Majumdar, Felix Heide
TL;DR
This work tackles the gap between optical neural networks and modern digital models by embedding computation in the camera optics. It introduces a large-kernel spatially-varying convolution (LKSV) implemented with a meta-optical front-end of nanophotonic metalenses and a lightweight electronic backend, achieving a mostly optical computation regime (>99% MACs) with a 4 mm front-end footprint. The LKSV kernel is learned via a low-dimensional reparameterization that factorizes a $15×15$ kernel into seven $3×3$ kernels and uses a spatially-varying basis, trained with regularizers to yield robust optical performance. Experimentally, the system reaches $72.76 ext{%}$ CIFAR-10 accuracy, outperforming AlexNet on CIFAR-10 with far fewer electronic parameters, and demonstrates transfer to ImageNet (top-5 $48.64 ext{%}$) and other vision tasks, validating the practicality of reconfigurable optical computing at the edge. Overall, this work shows that photonic front-ends can achieve modern deep-learning performance with ultra-low power, enabling fast, compact, edge-friendly AI accelerators.
Abstract
The explosive growth of computation and energy cost of artificial intelligence has spurred strong interests in new computing modalities as potential alternatives to conventional electronic processors. Photonic processors that execute operations using photons instead of electrons, have promised to enable optical neural networks with ultra-low latency and power consumption. However, existing optical neural networks, limited by the underlying network designs, have achieved image recognition accuracy far below that of state-of-the-art electronic neural networks. In this work, we close this gap by embedding massively parallelized optical computation into flat camera optics that perform neural network computation during the capture, before recording an image on the sensor. Specifically, we harness large kernels and propose a large-kernel spatially-varying convolutional neural network learned via low-dimensional reparameterization techniques. We experimentally instantiate the network with a flat meta-optical system that encompasses an array of nanophotonic structures designed to induce angle-dependent responses. Combined with an extremely lightweight electronic backend with approximately 2K parameters we demonstrate a reconfigurable nanophotonic neural network reaches 72.76\% blind test classification accuracy on CIFAR-10 dataset, and, as such, the first time, an optical neural network outperforms the first modern digital neural network -- AlexNet (72.64\%) with 57M parameters, bringing optical neural network into modern deep learning era.
