Table of Contents
Fetching ...

Training neural networks with end-to-end optical backpropagation

James Spall, Xianxin Guo, A. I. Lvovsky

TL;DR

This work addresses the challenge of optical implementation of backpropagation in a digital computer with a surprisingly simple scheme, employing saturable absorbers for the role of activation units and demonstrates the possibility of constructing NNs entirely reliant on analog optical processes for both training and inference tasks.

Abstract

Optics is an exciting route for the next generation of computing hardware for machine learning, promising several orders of magnitude enhancement in both computational speed and energy efficiency. However, to reach the full capacity of an optical neural network it is necessary that the computing not only for the inference, but also for the training be implemented optically. The primary algorithm for training a neural network is backpropagation, in which the calculation is performed in the order opposite to the information flow for inference. While straightforward in a digital computer, optical implementation of backpropagation has so far remained elusive, particularly because of the conflicting requirements for the optical element that implements the nonlinear activation function. In this work, we address this challenge for the first time with a surprisingly simple and generic scheme. Saturable absorbers are employed for the role of the activation units, and the required properties are achieved through a pump-probe process, in which the forward propagating signal acts as the pump and backward as the probe. Our approach is adaptable to various analog platforms, materials, and network structures, and it demonstrates the possibility of constructing neural networks entirely reliant on analog optical processes for both training and inference tasks.

Training neural networks with end-to-end optical backpropagation

TL;DR

This work addresses the challenge of optical implementation of backpropagation in a digital computer with a surprisingly simple scheme, employing saturable absorbers for the role of activation units and demonstrates the possibility of constructing NNs entirely reliant on analog optical processes for both training and inference tasks.

Abstract

Optics is an exciting route for the next generation of computing hardware for machine learning, promising several orders of magnitude enhancement in both computational speed and energy efficiency. However, to reach the full capacity of an optical neural network it is necessary that the computing not only for the inference, but also for the training be implemented optically. The primary algorithm for training a neural network is backpropagation, in which the calculation is performed in the order opposite to the information flow for inference. While straightforward in a digital computer, optical implementation of backpropagation has so far remained elusive, particularly because of the conflicting requirements for the optical element that implements the nonlinear activation function. In this work, we address this challenge for the first time with a surprisingly simple and generic scheme. Saturable absorbers are employed for the role of the activation units, and the required properties are achieved through a pump-probe process, in which the forward propagating signal acts as the pump and backward as the probe. Our approach is adaptable to various analog platforms, materials, and network structures, and it demonstrates the possibility of constructing neural networks entirely reliant on analog optical processes for both training and inference tasks.
Paper Structure (13 sections, 4 equations, 3 figures, 2 tables)

This paper contains 13 sections, 4 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Illustration of optical training.(a) Network architecture of the ONN used in this work, which consists of two fully-connected linear layers and a hidden layer. (b) Simplified experimental schematic of the ONN. Each linear layer performs optical MVM with a cylindrical lens and a spatial light modulator (SLM) that encodes the weight matrix. Hidden layer activations are computed using SA in an atomic vapour cell. Light propagates in both directions during optical training. (c) Working principle of SA activation. The forward beam (pump) is shown by solid red arrows, backward (probe) by purple wavy arrows. The probe transmission depends on the strength of the pump and approximates the gradient of the SA function. For high forward intensity (top panel), a large portion of the atoms are excited to the upper level. Stimulated emission produced by these atoms largely compensates the absorption due to the atoms in the ground level. For weak pump (bottom panel), the excited level population is low and the absorption is significant. (d) Neural network training procedure. (e) Optical training procedure. Both signal and error propagation in the two directions are fully implemented optically. Loss function calculation and parameter update are left for electronics without interrupting the optical information flow.
  • Figure 2: Multi-layer ONN characterisation.(a) Scatter plots of measured against theory results for MVM-1 (first layer forwards), MVM-2a (second layer forwards) and MVM-2b (second layer backwards). All three MVM results are taken simultaneously. Histograms of the signal and noise error for each MVM are displayed underneath. (b) First-layer activations $a^{(1)}_{\rm meas}$ measured after the vapor cell, plotted against the theoretically expected linear MVM-1 output $z^{(1)}_{\rm theory}$ before the cell. The green line is a best fit curve of the theoretical SA nonlinear function. (c) The amplitude of a weak constant probe passed backwards through the vapor cell as a function of the pump $z^{(1)}_{\rm theory}$, with constant input probe. Measurements for both forward and backward beams are taken simultaneously.
  • Figure 3: Optical training performance.(a) Decision boundary charts of the ONN inference output for three different classification tasks, after the ONN has been trained optically (top) or in-silico (bottom). (b) Learning curves of the ONN for classification of the 'rings' dataset, showing mean and standard deviation of the validation loss and accuracy averaged over 5 repeated training runs. Shown above are decision boundary charts of the ONN output for the test set, after different epochs. (c) Evolution of output neuron values, and of output errors, for the training set inputs of the two classes.