Table of Contents
Fetching ...

Compressed Meta-Optical Encoder for Image Classification

Anna Wirth-Singh, Jinlin Xiang, Minho Choi, Johannes E. Fröch, Luocheng Huang, Shane Colburn, Eli Shlizerman, Arka Majumdar

TL;DR

The paper presents a hybrid optical-electronic CNN that replaces most convolutional processing with a single optical convolution implemented via PSF-engineered meta-optics, while the electronic backend performs a linear classifier. Knowledge distillation from a pretrained AlexNet-Mod teacher enables compressing the network to two linear layers, circumventing the need for optical nonlinearities. Experimentally, a 16-kernel meta-optic front end coupled to a calibrated electronic backend achieves ~93–94% MNIST accuracy with ~85k MACs, representing substantial reductions in latency and power while maintaining competitive accuracy. The approach highlights scalable benefits for high-resolution inputs, due to the optical convolution’s effective constant-time scaling and seamless integration with existing CNN architectures.

Abstract

Optical and hybrid convolutional neural networks (CNNs) recently have become of increasing interest to achieve low-latency, low-power image classification and computer vision tasks. However, implementing optical nonlinearity is challenging, and omitting the nonlinear layers in a standard CNN comes at a significant reduction in accuracy. In this work, we use knowledge distillation to compress modified AlexNet to a single linear convolutional layer and an electronic backend (two fully connected layers). We obtain comparable performance to a purely electronic CNN with five convolutional layers and three fully connected layers. We implement the convolution optically via engineering the point spread function of an inverse-designed meta-optic. Using this hybrid approach, we estimate a reduction in multiply-accumulate operations from 17M in a conventional electronic modified AlexNet to only 86K in the hybrid compressed network enabled by the optical frontend. This constitutes over two orders of magnitude reduction in latency and power consumption. Furthermore, we experimentally demonstrate that the classification accuracy of the system exceeds 93% on the MNIST dataset.

Compressed Meta-Optical Encoder for Image Classification

TL;DR

The paper presents a hybrid optical-electronic CNN that replaces most convolutional processing with a single optical convolution implemented via PSF-engineered meta-optics, while the electronic backend performs a linear classifier. Knowledge distillation from a pretrained AlexNet-Mod teacher enables compressing the network to two linear layers, circumventing the need for optical nonlinearities. Experimentally, a 16-kernel meta-optic front end coupled to a calibrated electronic backend achieves ~93–94% MNIST accuracy with ~85k MACs, representing substantial reductions in latency and power while maintaining competitive accuracy. The approach highlights scalable benefits for high-resolution inputs, due to the optical convolution’s effective constant-time scaling and seamless integration with existing CNN architectures.

Abstract

Optical and hybrid convolutional neural networks (CNNs) recently have become of increasing interest to achieve low-latency, low-power image classification and computer vision tasks. However, implementing optical nonlinearity is challenging, and omitting the nonlinear layers in a standard CNN comes at a significant reduction in accuracy. In this work, we use knowledge distillation to compress modified AlexNet to a single linear convolutional layer and an electronic backend (two fully connected layers). We obtain comparable performance to a purely electronic CNN with five convolutional layers and three fully connected layers. We implement the convolution optically via engineering the point spread function of an inverse-designed meta-optic. Using this hybrid approach, we estimate a reduction in multiply-accumulate operations from 17M in a conventional electronic modified AlexNet to only 86K in the hybrid compressed network enabled by the optical frontend. This constitutes over two orders of magnitude reduction in latency and power consumption. Furthermore, we experimentally demonstrate that the classification accuracy of the system exceeds 93% on the MNIST dataset.
Paper Structure (15 sections, 4 equations, 4 figures, 2 tables)

This paper contains 15 sections, 4 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Schematic of convolutional neural networks for image classification tasks. (a) All-electronic multi-layered CNN. (b) All-electronic compressed CNN. (c) Hybrid CNN which combines an optical meta-optic front end and electronic backend. (d) The number of multiply-accumulate (MAC) operations of each network configuration, with convolutional MACs in green and fully-connected (FC) MACs in brown.
  • Figure 2: Schematic of the optical system. (a) PSF measurement setup using a monochromatic point light source (left) and optical convolution measurements using a micro-LED display (right). (b) A photograph of the fabricated meta-optics. The meta-optic contains 16 different sub-optics, spatially distributed in a single layer, operating in parallel for classification tasks. (c) Phase maps and SEM images of exemplary sub-optics corresponding to the positive and negative parts of a particular convolutional kernel. (d) The positive and negative parts of an example convolutional kernel (left) and the corresponding PSF simulation (middle) and right (experiment). (e) The simulated electronic output (left) and optical experiment (right) convolved output for the example kernel, for the case of an input "7" from MNIST.
  • Figure 3: Confusion matrices for different network architectures. (a) Classification results for AlexNet-Mod (multiple-layer electronic CNN). (b) Classification results for the all-electronic CNN compressed without using knowledge distillation. (c) Classification results for the all-electronic CNN compressed with knowledge distillation. (d) Classification results for the hybrid optical-electronic CNN.
  • Figure 4: PCA of the hybrid CNN. (a) PCA of the uncalibrated experimental hybrid CNN classification data. (b) PCA of the calibrated experimental data, which has been re-mapped and exhibits clustering behavior similar to that of the compressed electronic CNN data. (c) PCA of the compressed electronic CNN data.