Table of Contents
Fetching ...

Quantum-limited stochastic optical neural networks operating at a few quanta per activation

Shi-Yuan Ma, Tianyu Wang, Jérémie Laydevant, Logan G. Wright, Peter L. McMahon

TL;DR

The paper shows that optical neural networks can operate in a regime where each neuron is activated by only a few photons, introducing unavoidable shot noise. By training with a physics-informed stochastic model (physics-aware stochastic training), they achieve accurate MNIST classification with single-photon activations and ultra-low optical energy per MAC. Experimentally, a two-layer SPDNN yields up to 98% MNIST accuracy with energy on the order of 0.013–0.038 photons per MAC, demonstrating dramatic energy efficiency gains. The study also outlines coherent, deeper SPDNNs and argues that physics-aware software can unlock substantial benefits in ultra-low-power hardware, with potential extensions beyond optical implementations.

Abstract

Energy efficiency in computation is ultimately limited by noise, with quantum limits setting the fundamental noise floor. Analog physical neural networks hold promise for improved energy efficiency compared to digital electronic neural networks. However, they are typically operated in a relatively high-power regime so that the signal-to-noise ratio (SNR) is large, and the noise can be treated as a perturbation. We study optical neural networks where all layers except the last are operated in the limit that each neuron can be activated by just a single photon, and as a result the noise on neuron activations is no longer merely perturbative. We show that by using a physics-based probabilistic model of the neuron activations in training, it is possible to perform accurate machine-learning inference in spite of the extremely high shot noise (SNR ~ 1). We experimentally demonstrated MNIST handwritten-digit classification with a test accuracy of 98% using an optical neural network with a hidden layer operating in the single-photon regime; the optical energy used to perform the classification corresponds to just 0.038 photons per multiply-accumulate (MAC) operation. Our physics-aware stochastic training approach might also prove useful with non-optical ultra-low-power hardware.

Quantum-limited stochastic optical neural networks operating at a few quanta per activation

TL;DR

The paper shows that optical neural networks can operate in a regime where each neuron is activated by only a few photons, introducing unavoidable shot noise. By training with a physics-informed stochastic model (physics-aware stochastic training), they achieve accurate MNIST classification with single-photon activations and ultra-low optical energy per MAC. Experimentally, a two-layer SPDNN yields up to 98% MNIST accuracy with energy on the order of 0.013–0.038 photons per MAC, demonstrating dramatic energy efficiency gains. The study also outlines coherent, deeper SPDNNs and argues that physics-aware software can unlock substantial benefits in ultra-low-power hardware, with potential extensions beyond optical implementations.

Abstract

Energy efficiency in computation is ultimately limited by noise, with quantum limits setting the fundamental noise floor. Analog physical neural networks hold promise for improved energy efficiency compared to digital electronic neural networks. However, they are typically operated in a relatively high-power regime so that the signal-to-noise ratio (SNR) is large, and the noise can be treated as a perturbation. We study optical neural networks where all layers except the last are operated in the limit that each neuron can be activated by just a single photon, and as a result the noise on neuron activations is no longer merely perturbative. We show that by using a physics-based probabilistic model of the neuron activations in training, it is possible to perform accurate machine-learning inference in spite of the extremely high shot noise (SNR ~ 1). We experimentally demonstrated MNIST handwritten-digit classification with a test accuracy of 98% using an optical neural network with a hidden layer operating in the single-photon regime; the optical energy used to perform the classification corresponds to just 0.038 photons per multiply-accumulate (MAC) operation. Our physics-aware stochastic training approach might also prove useful with non-optical ultra-low-power hardware.
Paper Structure (9 sections, 3 equations, 4 figures)

This paper contains 9 sections, 3 equations, 4 figures.

Figures (4)

  • Figure 1: Deterministic inference using noisy neural-network hardware.a, The concept of a stochastic physical neural network performing a classification task. Given a particular input image to classify, repetitions exhibits variation (represented by different traces of the same color), but the class is predicted nearly deterministically. b, The single-to-noise ratio (SNR) of single-photon-detection neural networks (SPDNNs) compared to conventional optical neural networks (ONNs). Conventional ONNs operate with high photon budgets (SNR $\gg 1$) to obtain reliable results, whereas SPDNNs operate with low photon budgets---of up to just a few detected photons per shot (SNR $\sim 1$). The relation between the detected optical energy (in number of photons $N_\textrm{p}$) and SNR is SNR $=\sqrt{N_\textrm{p}}$, which is known as the shot-noise limit.
  • Figure 2: Single-photon-detection neural networks (SPDNNs): physics-aware stochastic training and inference.a, A single layer of an SPDNN, comprising an optical matrix-vector multiplier (optical MVM, in grey) and single-photon detectors (SPDs; in red), which perform stochastic nonlinear activations. Each output neuron's value is computed by the physical system as $a_i=f_{\text{SPD}}(z_i)$), where $z_i$ is the weighted sum (shown in green) of the input neurons to the $i$th output neuron computed as part of the optical MVM, and $a_i$ is the stochastic binary output from a single-photon detector. b, Forward and backward propagation through the SPD activation function. The optical energy ($\lambda$) incident on an SPD is a function of $z_i$ that depends on the encoding scheme used. Forward propagation uses the stochastic binary activation function $f_{\text{SPD}}$, while backpropagation involves the mean-field function of the probability $P_{\text{SPD}}$. c, Probability of an SPD detecting a click (output $a=1$) or not (output $a=0$), as a function of the incident light energy $\lambda$. d, Optical inference using an SPDNN with $L$ layers. The activation values from the SPD array of each layer are passed to light emitters for the optical MVM of the next layer. The last layer uses a conventional photodetector (PD) array instead of an SPD array, and is operated with enough optical energy that the output of this layer has high SNR. e, In silico training of an SPDNN with $L$ layers. Each forward propagation is stochastic, and during backpropagation, the error vector is passed to the hidden layers using the mean-field probability function $P_{\text{SPD}}$ instead of the stochastic activation function $f_{\text{SPD}}$. In this figure, $\partial x$ is shorthand for $\partial C / \partial x$, where $C$ is the cost function.
  • Figure 3: Performance of a single-photon-detection neural network (SPDNN) on MNIST handwritten-digit classification.a, An SPDNN realizing a multilayer perceptron (MLP) architecture of $N$ neurons in the hidden layer. The hidden layer ($784 \rightarrow N$) was computed using an incoherent optical matrix-vector-multiplier (MVM) followed by a single-photon-detector (SPD) array. Each SPD realized a stochastic activation function for a single hidden-layer neuron. During a single inference, the hidden layer was executed a small number of times ($1 \leq K \leq 5$), yielding averaged activation values. The output layer ($N \rightarrow 10$) was realized either optically---using an optical MVM and high photon budget to achieve high readout SNR, as in conventional ONNs, or with a digital electronic processor, yielding a result with full numerical precision. b, Simulated test accuracy of MNIST handwritten-digit classification for models with different numbers of hidden neurons $N$ and shots per activation $K$. Error bars, representing standard deviations from 100 repeated stochastic implementations with identical inputs and weights, are plotted but are too small to be easily visible. c, Experimental evaluation of the SPDNN, with the output layer performed with full numerical precision on a digital computer. Results are presented for both $K=1$ (single-shot, i.e., no averaging; top), $K=2$ (middle), and $K=5$ (bottom) shots per activation. Mean values and standard deviations (shown as error bars) were calculated from repeated stochastic implementations using identical inputs and weights (see Supplementary Note 8 for details). d, Experimental evaluation of the SPDNN, with both the hidden and the output layer executed using the optical experimental apparatus. The average number of detected photons used per inference in the hidden layer was kept fixed and the number used per inference in the output layer was varied. Mean values and standard deviations (shown as error bars) were calculated from repeated stochastic implementations using identical inputs and weights (see Supplementary Note 9 for details).
  • Figure 4: Simulation study predicting the performance of proposed coherent single-photon-detection neural networks (SPDNNs).a, The probability of detecting a photon as a function of the input light amplitude in a coherent SPDNN. Real-valued numbers are encoded in coherent light with either 0 phase (positive numbers) or $\pi$ phase (negative numbers). Measurement by a single-photon detector (SPD) results in the probabilistic detection of a photon that is proportional to the square of the encoded value $z$, in comparison to intensity encodings with incoherent light. b, Structure of a convolutional SPDNN with a kernel size of $5\times5$. Single-shot SPD measurements ($K=1$) are performed after each layer (by an SPD array), except for the output layer. Average $2\times2$ pooling is applied after each convolutional operation. A digital rectified linear unit (ReLU) agarap2018deep activation function can also be used in the linear layer as an alternative. c, Schematic of a convolutional layer with SPD activations. d, Simulated test accuracy of coherent SPDNNs with varying architecture performing MNIST handwritten-digit classification. The multilayer perceptron (MLP) models had 400 neurons in each hidden layer. The convolutional model consisted of a convolutional layer with 16 output channels, followed by two linear layers with an SPD activation inbetween. e, Simulated test accuracy of coherent SPDNNs with varying architecture performing CIFAR-10 image classification. The models have four convolutional layers, each followed by SPD activation functions. The two linear layers can either be implemented in full-precision with a ReLU activation function (in purple) or using the SPD activation function. The number of output channels for each convolutional layer is indicated above the corresponding data point.