Mirage: An RNS-Based Photonic Accelerator for DNN Training
Cansu Demirkiran, Guowei Yang, Darius Bunandar, Ajay Joshi
TL;DR
Mirage addresses the critical precision bottleneck of photonic DNN training by combining Block Floating Point (BFP) with Residue Number System (RNS) arithmetic to perform high-precision modular operations in the analog photonic core. It introduces a novel photonic micro-architecture (MMU/MDPU/MMVMU) and a three-moduli RNS dataflow that compute modular GEMMs and reconstruct results via CRT, enabling FP32-equivalent training accuracy for state-of-the-art DNNs. The work shows Mirage achieves average training speedups of $23.8\times$ and $32.1\times$ lower EDP in iso-energy, and $42.8\times$ lower power under iso-area, compared to systolic arrays, while maintaining high accuracy. These results demonstrate that hybrid RNS-BFP photonic accelerators can deliver both energy efficiency and precision for large-scale DNN training, with potential extensions to inference and other analog platforms.
Abstract
Photonic computing is a compelling avenue for performing highly efficient matrix multiplication, a crucial operation in Deep Neural Networks (DNNs). While this method has shown great success in DNN inference, meeting the high precision demands of DNN training proves challenging due to the precision limitations imposed by costly data converters and the analog noise inherent in photonic hardware. This paper proposes Mirage, a photonic DNN training accelerator that overcomes the precision challenges in photonic hardware using the Residue Number System (RNS). RNS is a numeral system based on modular arithmetic, allowing us to perform high-precision operations via multiple low-precision modular operations. In this work, we present a novel micro-architecture and dataflow for an RNS-based photonic tensor core performing modular arithmetic in the analog domain. By combining RNS and photonics, Mirage provides high energy efficiency without compromising precision and can successfully train state-of-the-art DNNs achieving accuracy comparable to FP32 training. Our study shows that on average across several DNNs when compared to systolic arrays, Mirage achieves more than $23.8\times$ faster training and $32.1\times$ lower EDP in an iso-energy scenario and consumes $42.8\times$ lower power with comparable or better EDP in an iso-area scenario.
