Table of Contents
Fetching ...

Single chip photonic deep neural network with accelerated training

Saumil Bandyopadhyay, Alexander Sludds, Stefan Krastanov, Ryan Hamerly, Nicholas Harris, Darius Bunandar, Matthew Streshinsky, Michael Hochberg, Dirk Englund

TL;DR

A fully integrated coherent optical neural network architecture for a deep neural network with six neurons and three layers that optically computes both linear and nonlinear functions with a latency of 410 ps is experimentally demonstrated, unlocking new applications that require ultrafast, direct processing of optical signals.

Abstract

As deep neural networks (DNNs) revolutionize machine learning, energy consumption and throughput are emerging as fundamental limitations of CMOS electronics. This has motivated a search for new hardware architectures optimized for artificial intelligence, such as electronic systolic arrays, memristor crossbar arrays, and optical accelerators. Optical systems can perform linear matrix operations at exceptionally high rate and efficiency, motivating recent demonstrations of low latency linear algebra and optical energy consumption below a photon per multiply-accumulate operation. However, demonstrating systems that co-integrate both linear and nonlinear processing units in a single chip remains a central challenge. Here we introduce such a system in a scalable photonic integrated circuit (PIC), enabled by several key advances: (i) high-bandwidth and low-power programmable nonlinear optical function units (NOFUs); (ii) coherent matrix multiplication units (CMXUs); and (iii) in situ training with optical acceleration. We experimentally demonstrate this fully-integrated coherent optical neural network (FICONN) architecture for a 3-layer DNN comprising 12 NOFUs and three CMXUs operating in the telecom C-band. Using in situ training on a vowel classification task, the FICONN achieves 92.7% accuracy on a test set, which is identical to the accuracy obtained on a digital computer with the same number of weights. This work lends experimental evidence to theoretical proposals for in situ training, unlocking orders of magnitude improvements in the throughput of training data. Moreover, the FICONN opens the path to inference at nanosecond latency and femtojoule per operation energy efficiency.

Single chip photonic deep neural network with accelerated training

TL;DR

A fully integrated coherent optical neural network architecture for a deep neural network with six neurons and three layers that optically computes both linear and nonlinear functions with a latency of 410 ps is experimentally demonstrated, unlocking new applications that require ultrafast, direct processing of optical signals.

Abstract

As deep neural networks (DNNs) revolutionize machine learning, energy consumption and throughput are emerging as fundamental limitations of CMOS electronics. This has motivated a search for new hardware architectures optimized for artificial intelligence, such as electronic systolic arrays, memristor crossbar arrays, and optical accelerators. Optical systems can perform linear matrix operations at exceptionally high rate and efficiency, motivating recent demonstrations of low latency linear algebra and optical energy consumption below a photon per multiply-accumulate operation. However, demonstrating systems that co-integrate both linear and nonlinear processing units in a single chip remains a central challenge. Here we introduce such a system in a scalable photonic integrated circuit (PIC), enabled by several key advances: (i) high-bandwidth and low-power programmable nonlinear optical function units (NOFUs); (ii) coherent matrix multiplication units (CMXUs); and (iii) in situ training with optical acceleration. We experimentally demonstrate this fully-integrated coherent optical neural network (FICONN) architecture for a 3-layer DNN comprising 12 NOFUs and three CMXUs operating in the telecom C-band. Using in situ training on a vowel classification task, the FICONN achieves 92.7% accuracy on a test set, which is identical to the accuracy obtained on a digital computer with the same number of weights. This work lends experimental evidence to theoretical proposals for in situ training, unlocking orders of magnitude improvements in the throughput of training data. Moreover, the FICONN opens the path to inference at nanosecond latency and femtojoule per operation energy efficiency.
Paper Structure (6 sections, 2 equations, 4 figures, 1 table)

This paper contains 6 sections, 2 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Architecture of the fully-integrated coherent optical neural network (FICONN). Inference is conducted entirely in the optical domain, without readout or amplification between layers. Light is fiber coupled into a single input on the chip and fanned out to the six channels of the transmitter (i). Each channel encodes the amplitude and phase of one element of the input $\mathbf{x}_{(j)}$ into the optical field $\mathbf{a}^{(1)}_{(j)}$ with a Mach-Zehnder modulator and an external phase shifter. The coherent matrix multiplication unit (ii), consisting of a Mach-Zehnder interferometer mesh, implements the linear transformation $\mathbf{b}_{(j)}^{(n)} = U^{(n)} \mathbf{a}_{(j)}^{(n)}$. Programmable nonlinear optical function units (iii) realize activation functions $\mathbf{a}_{(j)}^{(n+1)} = f(\mathbf{b}_{(j)}^{(n)})$ by tapping off part of the signal to a photodiode, which drives a cavity off-resonance by injecting carriers into the waveguide. An integrated coherent receiver (iv) reads out the DNN output by homodyning the output field with a local oscillator. Transimpedance amplifiers convert the output photocurrents to voltages, which are digitized and normalized to produce a quasi-probability distribution for a classification task. During in situ training, the model parameters $\mathbf{\Theta}$ are recurrently optimized to minimize the categorical cross-entropy over the training set $\mathbf{x}^{\mathrm{train}}_{(1),(2),...(N)}$.
  • Figure 2: a) Microscope image of the fabricated PIC. Key subsystems of the circuit are highlighted in the same color as the architecture depicted in Figure \ref{['architecture']}. The signal path through the PIC is indicated in white, while the local oscillator path is outlined in blue. b) Photonic packaging of the PIC for lab testing. Insets show side and top-down views of the packaged PIC. c) The fabricated transmitter splits off light coupled into the PIC to a local oscillator and fans out the remainder to six input channels. The inset shows the measured optical response of a typical channel. d) The coherent matrix multiplication unit is implemented with a Mach-Zehnder interferometer mesh. Each MZI comprises two directional couplers (DCs), an internal phase shifter $\theta_1$ between the two splitters, and an external phase shifter $\theta_2$ on one output mode. The histogram shows the measured fidelity of 500 arbitrary unitary matrices implemented on a single layer using a "direct" approach (orange) and an approach that takes into account hardware errors and thermal crosstalk (blue). e) The integrated coherent receiver (ICR). f) One channel of the ICR. Signal and LO are interfered on a 50-50 MMI and measured using balanced detectors.
  • Figure 3: a) The fabricated NOFU. A programmable MZI determines the fraction of light tapped off to the photodiode, and a waveguide delay line synchronizes the optical and electrical pulses. A pn-doped microring resonator modulates the incident field. b) Circuit diagram of resonant EO nonlinearity. The photocurrent $I_p$ directly drives a pn-doped resonant modulator. No amplifier stage is required between the two and the devices are directly connected on chip. By adjusting the bias voltage $V_B$, the nonlinearity can be operated in forward or reverse bias. c) Left: Detuning of the cavity resonance at various incident optical powers when operated in carrier injection mode ($V_B > 0$). Right: Cavity detuning in carrier depletion mode ($V_B < 0$). Our system realizes close to a linewidth detuning without the use of any amplifier, improving energy consumption and latency of the nonlinearity. A full linewidth detuning can be realized by further engineering the cavity finesse. d) Activation functions measured on chip. Arbitrary function shapes can be realized by adjusting the cavity detuning $\Delta \lambda$ and fraction of light $\beta$ tapped off to the photodiode.
  • Figure 4: a) A multivariate cost function $\mathcal{L}(\mathbf{\Theta})$ can be minimized by computing the directional derivative of the function along a random direction (black). This directs the optimization along the component of the gradient (red) parallel to the search direction. Over multiple iterations, the steps taken along random directions average to follow the direction of steepest descent to the minimum. b)In situ training procedure. At every iteration, the directional derivative of the cost function $\mathcal{L(\mathbf{\Theta})}$ is computed in hardware along a randomly chosen direction $\mathbf{\Delta}$ in the search space. $\mathbf{\Delta}$ is chosen from a Bernoulli distribution to be $\pm \delta$. The weights $\mathbf{\Theta}$ are then updated by the measured derivative following a learning rate $\eta$ chosen as a hyperparameter of the optimization. c)In situ training of a photonic DNN for vowel classification. We obtain 92.7% accuracy on a test set, which is the same as the performance (92.7%) obtained on a digital model with the same number of weights. Despite not having direct access to gradients, our approach produces a training curve similar to those produced by standard gradient descent algorithms.