A Comparative Analysis of Microrings Based Incoherent Photonic GEMM Accelerators

Sairam Sri Vatsavai; Venkata Sai Praneeth Karempudi; Oluwaseun Adewunmi Alo; Ishan Thakkar

A Comparative Analysis of Microrings Based Incoherent Photonic GEMM Accelerators

Sairam Sri Vatsavai, Venkata Sai Praneeth Karempudi, Oluwaseun Adewunmi Alo, Ishan Thakkar

TL;DR

This work addresses accelerating GEMMs for CNNs with incoherent photonic DPUs built from MRRs. It classifies DPU organizations by the order of five optical-manipulation blocks into ASMW, MASW, and SMWA, and analyzes how crosstalk and signal losses depend on this order, shaping achievable parallelism and power efficiency. Through circuit-level and system-level evaluation on four CNNs, the study shows that the SMWA organization enables the largest feasible DPU size $N$ under realistic budgets and delivers substantial gains in throughput ($N$-driven parallelism), energy efficiency, and area efficiency compared with ASMW and MASW. The findings offer concrete guidance for designing scalable photonic GEMM accelerators, demonstrating the practical impact of block-order on performance and energy metrics in CNN inference.

Abstract

Several microring resonator (MRR) based analog photonic architectures have been proposed to accelerate general matrix-matrix multiplications (GEMMs) in deep neural networks with exceptional throughput and energy efficiency. To implement GEMM functions, these MRR-based architectures, in general, manipulate optical signals in five different ways: (i) Splitting (copying) of multiple optical signals to achieve a certain fan-out, (ii) Aggregation (multiplexing) of multiple optical signals to achieve a certain fan-in, (iii) Modulation of optical signals to imprint input values onto analog signal amplitude, (iv) Weighting of modulated optical signals to achieve analog input-weight multiplication, (v) Summation of optical signals. The MRR-based GEMM accelerators undertake the first four ways of signal manipulation in an arbitrary order ignoring the possible impact of the order of these manipulations on their performance. In this paper, we conduct a detailed analysis of accelerator organizations with three different orders of these manipulations: (1) Modulation-Aggregation-Splitting-Weighting (MASW), (2) Aggregation-Splitting-Modulation-Weighting (ASMW), and (3) Splitting-Modulation-Weighting-Aggregation (SMWA). We show that these organizations affect the crosstalk noise and optical signal losses in different magnitudes, which renders these organizations with different levels of processing parallelism at the circuit level, and different magnitudes of throughput and energy-area efficiency at the system level. Our evaluation results for four CNN models show that SMWA organization achieves up to 4.4$\times$, 5$\times$, and 5.2$\times$ better throughput, energy efficiency, and area-energy efficiency, respectively, compared to ASMW and MASW organizations on average.

A Comparative Analysis of Microrings Based Incoherent Photonic GEMM Accelerators

TL;DR

under realistic budgets and delivers substantial gains in throughput (

-driven parallelism), energy efficiency, and area efficiency compared with ASMW and MASW. The findings offer concrete guidance for designing scalable photonic GEMM accelerators, demonstrating the practical impact of block-order on performance and energy metrics in CNN inference.

Abstract

, 5

, and 5.2

better throughput, energy efficiency, and area-energy efficiency, respectively, compared to ASMW and MASW organizations on average.

Paper Structure (25 sections, 3 equations, 7 figures, 6 tables)

This paper contains 25 sections, 3 equations, 7 figures, 6 tables.

Introduction
Preliminaries
Processing of CNNs on Hardware Accelerators
Related Work on Optical GEMM Accelerators
Organizations of MRR-based GEMM Accelerators
Description of Various Blocks that Maniplate Optical Channels in MRR-based GEMM Accelerators
ASMW DPU Organization
MASW DPU Organization
SMWA DPU Organization
Motivation
Circuit-Level Comparative Analysis
Impacts on Power Penalty Due to Crosstalk Effects
Inter-modulation crosstalk
Cross-weight penalty
MRR Filter Penalty
...and 10 more sections

Figures (7)

Figure 1: Convolution operation at a convolution layer with two weight filters and one input feature map (Fmap) having two channels is transformed into a GEMM operation between input matrix I and weight matrix W.
Figure 2: (a) Common optical signal manipulation blocks found in optical DPUs. Illustration of common incoherent photonic DPU organizations; (b) AMSW DPU, (c) MASW DPU, and (d) SMWA DPU.
Figure 3: Conceptual breakdown of optical power budget usage and dependency of DPU size N on supported bit precision B for different values of B={2, 3}-bits across datarates DR={1, 5} GS/s.
Figure 4: (a) Types of losses and power penalties at different optical signal manipulation blocks of optical DPUs. Illustration of (b) Inter-Modulation crosstalk at MRM input arrays padmaraju2014intermodulationkarempudi2022, and (c) Filter crosstalk and signal truncation at filters filterpenalty.
Figure 5: Supported DPU size N (=M) for bit precision={1, 2, 3, 4, 5, 6, 7, 8} bits at data rates (DRs)={1, 5, 10} GS/s, for AMW, MAW, and MWA DPUs.
...and 2 more figures

A Comparative Analysis of Microrings Based Incoherent Photonic GEMM Accelerators

TL;DR

Abstract

A Comparative Analysis of Microrings Based Incoherent Photonic GEMM Accelerators

Authors

TL;DR

Abstract

Table of Contents

Figures (7)