Table of Contents
Fetching ...

Scaling Analog Photonic Accelerators for Byte-Size, Integer General Matrix Multiply (GEMM) Kernels

Oluwaseun Adewunmi Alo, Sairam Sri Vatsavai, Ishan Thakkar

TL;DR

This work addresses the bottleneck of byte-size GEMM support in analog photonic accelerators for DNN training. It introduces SPOGA, a photonic GEMM accelerator that uses homodyne optical signals and in-transduction positional weighting to extend the dataflow and avoid bit-sliced post-processing. The architecture combines Optical Analog Multiplier Ensembles (OAME), aggregation lanes, and a Positional Weighting and Accumulation Block (PWAB) to perform INT8 GEMMs with reduced overhead, achieving up to 14.4× higher FPS, 2× higher FPS/W, and 28.5× higher FPS/W/mm² compared to prior photonic solutions. System-level simulations on four CNNs demonstrate broad improvements in throughput and energy efficiency, underscoring SPOGA’s potential for scalable, efficient photonic acceleration of DNN training workloads.

Abstract

Deep Neural Networks (DNNs) predominantly rely on General Matrix Multiply (GEMM) kernels, which are often accelerated using specialized hardware architectures. Recently, analog photonic GEMM accelerators have emerged as a promising alternative, offering vastly superior speed and energy efficiency compared to traditional electronic accelerators. However, these photonic cannot support wider than 4-bit integer operands due to their inherent trade-offs between analog dynamic range and parallelism. This is often inadequate for DNN training as at least 8-bit wide operands are deemed necessary to prevent significant accuracy drops. To address these limitations, we introduce a scalable photonic GEMM accelerator named SPOGA. SPOGA utilizes enhanced features such as analog summation of homodyne optical signals and in-transduction positional weighting of operands. By employing an extended optical-analog dataflow that minimizes overheads associated with bit-sliced integer arithmetic, SPOGA supports byte-size integer GEMM kernels, achieving significant improvements in throughput, latency, and energy efficiency. Specifically, SPOGA demonstrates up to 14.4$\times$, 2$\times$, and 28.5$\times$ improvements in frames-per-second (FPS), FPS/Watt, and FPS/Watt/mm$^2$ respectively, compared to existing state-of-the-art photonic solutions.

Scaling Analog Photonic Accelerators for Byte-Size, Integer General Matrix Multiply (GEMM) Kernels

TL;DR

This work addresses the bottleneck of byte-size GEMM support in analog photonic accelerators for DNN training. It introduces SPOGA, a photonic GEMM accelerator that uses homodyne optical signals and in-transduction positional weighting to extend the dataflow and avoid bit-sliced post-processing. The architecture combines Optical Analog Multiplier Ensembles (OAME), aggregation lanes, and a Positional Weighting and Accumulation Block (PWAB) to perform INT8 GEMMs with reduced overhead, achieving up to 14.4× higher FPS, 2× higher FPS/W, and 28.5× higher FPS/W/mm² compared to prior photonic solutions. System-level simulations on four CNNs demonstrate broad improvements in throughput and energy efficiency, underscoring SPOGA’s potential for scalable, efficient photonic acceleration of DNN training workloads.

Abstract

Deep Neural Networks (DNNs) predominantly rely on General Matrix Multiply (GEMM) kernels, which are often accelerated using specialized hardware architectures. Recently, analog photonic GEMM accelerators have emerged as a promising alternative, offering vastly superior speed and energy efficiency compared to traditional electronic accelerators. However, these photonic cannot support wider than 4-bit integer operands due to their inherent trade-offs between analog dynamic range and parallelism. This is often inadequate for DNN training as at least 8-bit wide operands are deemed necessary to prevent significant accuracy drops. To address these limitations, we introduce a scalable photonic GEMM accelerator named SPOGA. SPOGA utilizes enhanced features such as analog summation of homodyne optical signals and in-transduction positional weighting of operands. By employing an extended optical-analog dataflow that minimizes overheads associated with bit-sliced integer arithmetic, SPOGA supports byte-size integer GEMM kernels, achieving significant improvements in throughput, latency, and energy efficiency. Specifically, SPOGA demonstrates up to 14.4, 2, and 28.5 improvements in frames-per-second (FPS), FPS/Watt, and FPS/Watt/mm respectively, compared to existing state-of-the-art photonic solutions.
Paper Structure (17 sections, 5 figures, 2 tables)

This paper contains 17 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Illustration of a General Matrix Multiplication (GEMM) operation and its mapping on photonic GEMM cores.
  • Figure 2: Different methods of implementing and mapping a GEMM function for hardware acceleration using bit-sliced integer arithmetic.
  • Figure 3: SPOGA Architecture Overview. (a) An Optical Analog Multiplier Ensemble (OAME), a component that composes a SPOGA GEMM core comprising Dot Product Units (DPUs) in (c). (b) A Balanced Photo Charge Accumulator (BPCA) that composes the Positional Weighting and Accumulation Block (PWAB) in (a) and (c).
  • Figure 4: Schematic of system-level implementation of SPOGA.
  • Figure 5: Evaluation results for SPOGA versus HOLYLIGHT (MAW) and DEAPCNN (AMW) accelerators at 5 GS/s and 10 GS/s datarates.