Predictive Performance of Photonic SRAM-based In-Memory Computing for Tensor Decomposition
Sasindu Wijeratne, Sugeet Sunder, Md Abdullah-Al Kaiser, Akhilesh Jaiswal, Clynn Mathew, Ajey P. Jacob, Viktor Prasanna
TL;DR
This work addresses the bottleneck of MTTKRP in CPD-based tensor decomposition by proposing a scalable photonic SRAM (pSRAM) array embedded in an optical in-memory compute engine. It introduces an architecture that uses WDM hyperspectral encoding, cross-coupled microring resonator bitcells, and comb-based modulation to enable ultra-fast, low-energy operations, and maps the CPD primitives CP1–CP3 to the pSRAM for MTTKRP. A predictive performance model demonstrates sustained performance up to 17 PetaOps with 8-bit precision in a practical 52-channel, 20 GHz configuration, highlighting the potential of optical memory-compute co-design to overcome memory-bandwidth limitations. The findings suggest significant practical impact for accelerating data-intensive tasks such as tensor decomposition in domains like ML, signal processing, and bioinformatics, by reducing data movement and enabling high-throughput, scalable photonic processing.
Abstract
Photonics-based in-memory computing systems have demonstrated a significant speedup over traditional transistor-based systems because of their ultra-fast operating frequencies and high data bandwidths. Photonic static random access memory (pSRAM) is a crucial component for achieving the objective of ultra-fast photonic in-memory computing systems. In this work, we model and evaluate the performance of a novel photonic SRAM array architecture in development. Additionally, we examine hyperspectral operation through wavelength division multiplexing (WDM) to enhance the throughput of the pSRAM array. We map Matricized Tensor Times Khatri-Rao Product (MTTKRP), a computational kernel commonly used in tensor decomposition, to the proposed pSRAM array architecture. We also develop a predictive performance model to estimate the sustained performance of different configurations of the pSRAM array. Using the predictive performance model, we demonstrate that the pSRAM array achieves 17 PetaOps while performing MTTKRP in a practical hardware configuration.
