120 GOPS Photonic Tensor Core in Thin-film Lithium Niobate for Inference and in-situ Training

Zhongjin Lin; Bhavin J. Shastri; Shangxuan Yu; Jingxiang Song; Yuntao Zhu; Arman Safarnejadian; Wangning Cai; Yanmei Lin; Wei Ke; Mustafa Hammood; Tianye Wang; Mengyue Xu; Zibo Zheng; Mohammed Al-Qadasi; Omid Esmaeeli; Mohamed Rahim; Grzegorz Pakulski; Jens Schmid; Pedro Barrios; Weihong Jiang; Hugh Morison; Matthew Mitchell; Xun Guan; Nicolas A. F. Jaeger; Leslie A. n Rusch; Sudip Shekhar; Wei Shi; Siyuan Yu; Xinlun Cai; Lukas Chrostowski

120 GOPS Photonic Tensor Core in Thin-film Lithium Niobate for Inference and in-situ Training

Zhongjin Lin, Bhavin J. Shastri, Shangxuan Yu, Jingxiang Song, Yuntao Zhu, Arman Safarnejadian, Wangning Cai, Yanmei Lin, Wei Ke, Mustafa Hammood, Tianye Wang, Mengyue Xu, Zibo Zheng, Mohammed Al-Qadasi, Omid Esmaeeli, Mohamed Rahim, Grzegorz Pakulski, Jens Schmid, Pedro Barrios, Weihong Jiang, Hugh Morison, Matthew Mitchell, Xun Guan, Nicolas A. F. Jaeger, Leslie A. n Rusch, Sudip Shekhar, Wei Shi, Siyuan Yu, Xinlun Cai, Lukas Chrostowski

TL;DR

A fully integrated photonic tensor core, consisting of only two thin-film lithium niobate (TFLN) modulators, a III-V laser, and a charge-integration photoreceiver, that achieves a computational speed of 120 GOPS for neural networks, with capabilities of in-situ training that support exciting prospects of negative number multiplication.

Abstract

Photonics offers a transformative approach to artificial intelligence (AI) and neuromorphic computing by enabling low-latency, high-speed, and energy-efficient computations. However, conventional photonic tensor cores face significant challenges in constructing large-scale photonic neuromorphic networks. Here, we propose a fully integrated photonic tensor core, consisting of only two thin-film lithium niobate (TFLN) modulators, a III-V laser, and a charge-integration photoreceiver. Despite its simple architecture, it is capable of implementing an entire layer of a neural network with a computational speed of 120 GOPS, while also allowing flexible adjustment of the number of inputs (fan-in) and outputs (fan-out). Our tensor core supports rapid in-situ training with a weight update speed of 60 GHz. Furthermore, it successfully classifies (supervised learning) and clusters (unsupervised learning) 112 * 112-pixel images through in-situ training. To enable in-situ training for clustering AI tasks, we offer a solution for performing multiplications between two negative numbers.

120 GOPS Photonic Tensor Core in Thin-film Lithium Niobate for Inference and in-situ Training

TL;DR

Abstract

Paper Structure (20 sections, 6 equations, 6 figures)

This paper contains 20 sections, 6 equations, 6 figures.

Introduction
Results
Discussion
Methods
Data availability
Code availability
Acknowledgements
Author contributions
Competing interests

Figures (6)

Figure 1: Concept of our integrated photonic tensor core (IPTC).a Top: Applications and functions of artificial intelligence (AI) mwase2022communicationbeath2012findinglin2023highrabah2018convergence. AI systems require processors to be adaptable to analyze data from different devices for various AI tasks, including supervised and unsupervised learning AI tasks. Bottom: A schematic of our proposed IPTC, consisting of four physical components: lasers, two thin-film lithium niobate (TFLN) Mach-Zehnder modulators (MZMs), and charge-integration photoreceivers. Using these four physical components, our processor can implement an entire layer of a neural network. b A schematic of a conventional wavelength-division multiplexing (WDM)-based IPTC which includes $m$ neurons. PCM: phase change material. c The performance of our device compares with that of several state-of-the-art photonic tensor cores xu202111feldmann2021parallelmourgias2022noiseashtiani2022chipsludds2022delocalizedshen2017deepzhang2021optical in terms of compactness, dot product operation principle, computational speed, and the available dimension of vector in a dot product. Here, the available dimension means the processor completely executes the dot product operation without the assistance of traditional digital electronic computers.
Figure 2: A prototype of the packaged device.a Photo of the entire device. The top is our hybrid integrated chip; the Bottom is the electric control and power supply circuits of the charge-integration photoreceiver. b is a micrograph of our hybrid integrated chip, including the fabricated TFLN photonic circuit, flip-chip photodetectors, and laser. c-e are the zoomed-in micrographs of flip-chip photodetectors, the travelling-wave electrode of the modulator, and the laser, respectively. f A micrograph of the side view of our device, showing the relative heights of the TFLN chip, laser, and photodetectors. g The light-current-voltage curves for the light coupled into the TFLN chip from the laser. h Electro-optic bandwidth ($\text{S}_{21}$ parameter) of the modulator. i The output voltage of the photoreceiver varies with the integration time when the optical power is fixed at a certain value. In a balanced detection scheme, when the optical power received by PD1 is lower than that received by PD2, the output voltage variation of the integrator is positive and, when it is higher than that received by PD2, the output voltage variation of the integrator is negative.
Figure 3: Experimental result for dot product operation with our device.a A schematic of the working principle of our device. The light is emitted from a laser and then passes through two cascaded thin-film lithium niobate (TFLN) Mach-Zehnder modulators (MZMs). The TFLN MZMs are driven by a high-speed arbitrary waveform generator (AWG). The light is then received by two PDs in a balanced detection scheme, and the corresponding photogenerated electrons are accumulated in the integrator. Reading the output voltage of the integrator with ADC, we can obtain the dot product result. PD: photodetector. ADC: analog-to-digital converter. IC: integrated circuit. DAC: digital to analog converter. b The results of dot product operation between two 131072-dimensional vectors performed by our device with a computational speed of 120 GOPS. Compared with the expected dot product results, the error of the measured ones has a standard deviation of 0.03 (6.04 bits).
Figure 4: Classification results of handwritten digits using our device.a A block diagram of a multilayer perceptron neural network, which consists of an input layer, two hidden layers, and an output layer that provides classification outputs. b A schematic of the in-situ training, a form of online training, where our IPTC handles forward propagation while the computer manages the nonlinearity function and backpropagation. c The validation accuracy as a function of epoch for in-situ training (solid red line) scheme compared to that running on just a central processing unit (CPU, dashed blue line). d and e Theoretically calculated confusion matrices (purely run by the CPU) and experimental confusion matrices (run by our IPTC) using the MNIST large-scale database jansson2022scale. For "in-situ" training, 2000 handwritten digits are used for training, and 500 digits are used for testing. Our IPTC achieves classification accuracy comparable to that achieved by the CPU.
Figure 5: Clustering result of the handwritten digits using our device.a A schematic of the working principle of our device to perform the multiplication between two numbers with any signs, including two negative numbers. MZM: Mach-Zehnder modulator. b The variance of the projected points, $\bf{Xb_i}$, as a function of iterations when finding the first principle component using the power method. $\mathbf{X}$ is a $p\times n$ data matrix, representing the 10000 of the handwritten digits from the MNIST large-scale database jansson2022scale. $\mathbf{b}_{i}$ is a $n\times 1$ unit vector, obtained at the $i^{th}$ iteration. The algorithm performed by our device has a comparable iteration speed with that of the central processing unit (CPU). c and d are the front and rear views of the 3D coordinates of each handwritten digit based on the scores of projecting onto the first three principal components (PCs), respectively.
...and 1 more figures

120 GOPS Photonic Tensor Core in Thin-film Lithium Niobate for Inference and in-situ Training

TL;DR

Abstract

120 GOPS Photonic Tensor Core in Thin-film Lithium Niobate for Inference and in-situ Training

Authors

TL;DR

Abstract

Table of Contents

Figures (6)