Table of Contents
Fetching ...

Photonic systolic array for all-optical matrix-matrix multiplication

Jungmin Kim, Qingyi Zhou, Zongfu Yu

TL;DR

A photonic systolic array that performs MMM entirely with optical signals is proposed, utilizing homodyne detection at each array cell, marking a significant step toward practical photonic computing hardware for modern AI workloads.

Abstract

Systolic arrays have proven to be highly efficient for parallelized matrix-matrix multiplication (MMM), utilizing synchronized, heartbeat-like data flows across an array of processing elements. While optical structures such as waveguide crossbar arrays and Mach-Zehnder interferometer-based meshes serve as photonic equivalents to the systolic arrays, the disparity between the two input matrices for multiplication -- one using optical signals and the other with system-defined parameters -- gives rise to a bottleneck in modern machine-learning tasks, such as evaluating attention scores in large language models. Here, we propose a photonic systolic array that performs MMM entirely with optical signals, utilizing homodyne detection at each array cell. Adjoint-based design of compact on-chip freeform optical modules enables precise control of light flow without bulky waveguide coupling schemes. The operation of a $4\times4$ photonic systolic array is numerically verified, achieving a theoretical computation density of $6.2~\mathrm{PMACs}/\mathrm{mm}^2/\mathrm{s}$. This design marks a significant step toward practical photonic computing hardware for modern AI workloads.

Photonic systolic array for all-optical matrix-matrix multiplication

TL;DR

A photonic systolic array that performs MMM entirely with optical signals is proposed, utilizing homodyne detection at each array cell, marking a significant step toward practical photonic computing hardware for modern AI workloads.

Abstract

Systolic arrays have proven to be highly efficient for parallelized matrix-matrix multiplication (MMM), utilizing synchronized, heartbeat-like data flows across an array of processing elements. While optical structures such as waveguide crossbar arrays and Mach-Zehnder interferometer-based meshes serve as photonic equivalents to the systolic arrays, the disparity between the two input matrices for multiplication -- one using optical signals and the other with system-defined parameters -- gives rise to a bottleneck in modern machine-learning tasks, such as evaluating attention scores in large language models. Here, we propose a photonic systolic array that performs MMM entirely with optical signals, utilizing homodyne detection at each array cell. Adjoint-based design of compact on-chip freeform optical modules enables precise control of light flow without bulky waveguide coupling schemes. The operation of a photonic systolic array is numerically verified, achieving a theoretical computation density of . This design marks a significant step toward practical photonic computing hardware for modern AI workloads.

Paper Structure

This paper contains 7 sections, 4 equations, 4 figures.

Figures (4)

  • Figure 1: Concept of photonic systolic array. a, Weight-stationary type: input signals $\vb X$ are transformed to the output signals $\vb Y = \vb W \vb X$ by a system parameters $\vb W$. b, Output-stationary type: two input signals ($\vb A$ and $\vb B$) are multiplied on the array of multiply-accumulate (MAC) units, resulting in the stationary output ($\vb C = \vb A^T\vb B$). The array is interconnected via vertical and horizontal waveguides, carrying amplitude-modulated optical pulses for parallel MAC operations, respectively. c, Each optical MAC unit consists of a waveguide crossing, two branches indexed by $m$ and $n$, and a beam splitter for homodyne detection. The waveguides and submodules are built from a Si slab structure with uniform thickness $h_\mathrm{wg}$, embedded within a SiO$_2$ buried oxide layer and cladding. Parameters: $L=3.50$; $h_\text{wg} = 0.22$; $w_\text{wg} = 0.3$ [$\mu\text{m}$].
  • Figure 2: Submodule designs. a, Inverse design results from adjoint optimization: crossing, beam splitter, and branches for $n=1, 2, 3$ and $4$ (from left to right). Black contours and red dashed boxes represent the Si/SiO$_2$ boundary and the square design region with a side length of 3.5 $\mu$m, respectively. The colormap illustrates the wave flow, $\Re(H_z)$, with TM$_{00}$ mode excitation at the bottom port (black arrows, $f=f_0$). b, Transmission spectra of the structures shown in a for two output ports (yellow and teal arrows). Horizontal lines mark the target transmission values at the carrier frequency $f_0 =$ 193.4 THz.
  • Figure 3: Time-domain operation of the $(2,2)$-MAC unit. a,b, Illustration of the $(2,2)$-MAC unit and the corresponding wave propagation for input combinations $(a,b)=(1,0)$ (a) and $(1,1)$ (b) at frequency $f=f_0$. c,d, Scalar multiplication results for $-1<a, b<1$ using a finite-width pulse (c) and continuous wave ($f=f_0$, d). e, Evolution of Gaussian pulses from the input (top) to the through (middle) and two detection ports (bottom), with various incident pulse widths (1-4). f, Measure of signal deformation as a function of the incident pulse width: overlap coefficient between two detection signals (black line) and the dispersion factor through the MAC unit (blue line), as illustrated by inset diagrams.
  • Figure 4: Full-wave simulation of an optical systolic array for the outer product of two vectors. a, Example input (A and B) and output (C and D) signals for a $4\times4$ PSA. Gray dashed lines denote the peak-power time for MAC units where $m+n=8, 7, \cdots, 2$ from left to right. b, Field profile at the waveguide plane. c, Out-of-plane power emission through grating couplers at the detection plane ($z=0.968~\mu\mathrm{m}$, c). Black and red boxes represent the four submodules of the MAC unit and additional grating couplers, respectively. Input ports and output detectors are marked with blue (A) and yellow (B) arrows and green (C) and purple (D) dashed boxes. The lattice period of the array is 12.7 $\mu$m. d-g, Outer product results: ground truth values for given inputs in a (d), output raw data in units of a power constant $P_\mathrm{max}$ defined in the main text (e), and element-wise normalized output for error correction (f); and the element-wise normalized output for inputs $\vb A=[0,1,0,0]$ and $\vb B=[0,0,1,0]$ (g).