Mirage: An RNS-Based Photonic Accelerator for DNN Training

Cansu Demirkiran; Guowei Yang; Darius Bunandar; Ajay Joshi

Mirage: An RNS-Based Photonic Accelerator for DNN Training

Cansu Demirkiran, Guowei Yang, Darius Bunandar, Ajay Joshi

TL;DR

Mirage addresses the critical precision bottleneck of photonic DNN training by combining Block Floating Point (BFP) with Residue Number System (RNS) arithmetic to perform high-precision modular operations in the analog photonic core. It introduces a novel photonic micro-architecture (MMU/MDPU/MMVMU) and a three-moduli RNS dataflow that compute modular GEMMs and reconstruct results via CRT, enabling FP32-equivalent training accuracy for state-of-the-art DNNs. The work shows Mirage achieves average training speedups of $23.8\times$ and $32.1\times$ lower EDP in iso-energy, and $42.8\times$ lower power under iso-area, compared to systolic arrays, while maintaining high accuracy. These results demonstrate that hybrid RNS-BFP photonic accelerators can deliver both energy efficiency and precision for large-scale DNN training, with potential extensions to inference and other analog platforms.

Abstract

Photonic computing is a compelling avenue for performing highly efficient matrix multiplication, a crucial operation in Deep Neural Networks (DNNs). While this method has shown great success in DNN inference, meeting the high precision demands of DNN training proves challenging due to the precision limitations imposed by costly data converters and the analog noise inherent in photonic hardware. This paper proposes Mirage, a photonic DNN training accelerator that overcomes the precision challenges in photonic hardware using the Residue Number System (RNS). RNS is a numeral system based on modular arithmetic, allowing us to perform high-precision operations via multiple low-precision modular operations. In this work, we present a novel micro-architecture and dataflow for an RNS-based photonic tensor core performing modular arithmetic in the analog domain. By combining RNS and photonics, Mirage provides high energy efficiency without compromising precision and can successfully train state-of-the-art DNNs achieving accuracy comparable to FP32 training. Our study shows that on average across several DNNs when compared to systolic arrays, Mirage achieves more than $23.8\times$ faster training and $32.1\times$ lower EDP in an iso-energy scenario and consumes $42.8\times$ lower power with comparable or better EDP in an iso-area scenario.

Mirage: An RNS-Based Photonic Accelerator for DNN Training

TL;DR

and

lower EDP in iso-energy, and

lower power under iso-area, compared to systolic arrays, while maintaining high accuracy. These results demonstrate that hybrid RNS-BFP photonic accelerators can deliver both energy efficiency and precision for large-scale DNN training, with potential extensions to inference and other analog platforms.

Abstract

faster training and

lower EDP in an iso-energy scenario and consumes

lower power with comparable or better EDP in an iso-area scenario.

Paper Structure (34 sections, 14 equations, 9 figures, 3 tables)

This paper contains 34 sections, 14 equations, 9 figures, 3 tables.

Introduction
Background
DNN Training
Data Formats for DNNs
Bit Precision in Conventional Analog Cores
The Residue Number System (RNS)
Device Metrics and Noise Sources in Silicon Photonics
Modulation Mechanisms and Device Tradeoffs
Sources of Analog Noise
RNS-Based Dataflow in Mirage
Mirage micro-architecture
Photonic Modular Arithmetic Units
Modular Multiplication Unit (MMU)
Modular Dot Product Unit (MDPU) and Modular MVM Unit (MMVMU)
Phase Detection Unit
...and 19 more sections

Figures (9)

Figure 1: (a) Dataflow for a conventional analog core. (b) Energy consumption per conversion in ADCs and DACs with varying bit precision. The energy per conversion numbers are estimated using equations formulated by Murmann murmann21mixed.
Figure 2: Mirage's RNS-based dataflow for a single tiled-MVM operation as part of a forward pass. We show a four-moduli case in this figure as an example.
Figure 3: (a) Simple MZM with phase shifters with length $L$ and applied voltage $V$. (b) 3-bit modular multiplication using cascaded phase shifters. (c) Routing light using MRR switches. (d) 3-bit modular multiplication using MRR switches.
Figure 4: (a) RNS-based MMVM Unit (RNS-MMVMU) micro-architecture. (b) Phase detection unit. The top arms of the two rows detect the amplitude of the incoming signals directly while the bottom arms apply $\pi/2$ radians phase shift and detect the amplitude. Phase detection is done by using these two amplitude values. (c) Main components of Mirage architecture with four RNS-MMVMUs and three moduli as an example.
Figure 5: (a) ResNet18 validation accuracy on Imagenet after training from scratch for 60 epochs and (b) energy per MAC operation (pJ/MAC) for varying $b_m$ and $g$. This analysis includes energy consumed by lasers and tuning circuitry, TIAs, DACs and ADCs, FP-BFP, and RNS-BNS conversions. Here, ResNet18 is shown as an example. We observed similar behavior for other evaluated DNNs.
...and 4 more figures

Mirage: An RNS-Based Photonic Accelerator for DNN Training

TL;DR

Abstract

Mirage: An RNS-Based Photonic Accelerator for DNN Training

Authors

TL;DR

Abstract

Table of Contents

Figures (9)