Table of Contents
Fetching ...

Predictive Performance of Photonic SRAM-based In-Memory Computing for Tensor Decomposition

Sasindu Wijeratne, Sugeet Sunder, Md Abdullah-Al Kaiser, Akhilesh Jaiswal, Clynn Mathew, Ajey P. Jacob, Viktor Prasanna

TL;DR

This work addresses the bottleneck of MTTKRP in CPD-based tensor decomposition by proposing a scalable photonic SRAM (pSRAM) array embedded in an optical in-memory compute engine. It introduces an architecture that uses WDM hyperspectral encoding, cross-coupled microring resonator bitcells, and comb-based modulation to enable ultra-fast, low-energy operations, and maps the CPD primitives CP1–CP3 to the pSRAM for MTTKRP. A predictive performance model demonstrates sustained performance up to 17 PetaOps with 8-bit precision in a practical 52-channel, 20 GHz configuration, highlighting the potential of optical memory-compute co-design to overcome memory-bandwidth limitations. The findings suggest significant practical impact for accelerating data-intensive tasks such as tensor decomposition in domains like ML, signal processing, and bioinformatics, by reducing data movement and enabling high-throughput, scalable photonic processing.

Abstract

Photonics-based in-memory computing systems have demonstrated a significant speedup over traditional transistor-based systems because of their ultra-fast operating frequencies and high data bandwidths. Photonic static random access memory (pSRAM) is a crucial component for achieving the objective of ultra-fast photonic in-memory computing systems. In this work, we model and evaluate the performance of a novel photonic SRAM array architecture in development. Additionally, we examine hyperspectral operation through wavelength division multiplexing (WDM) to enhance the throughput of the pSRAM array. We map Matricized Tensor Times Khatri-Rao Product (MTTKRP), a computational kernel commonly used in tensor decomposition, to the proposed pSRAM array architecture. We also develop a predictive performance model to estimate the sustained performance of different configurations of the pSRAM array. Using the predictive performance model, we demonstrate that the pSRAM array achieves 17 PetaOps while performing MTTKRP in a practical hardware configuration.

Predictive Performance of Photonic SRAM-based In-Memory Computing for Tensor Decomposition

TL;DR

This work addresses the bottleneck of MTTKRP in CPD-based tensor decomposition by proposing a scalable photonic SRAM (pSRAM) array embedded in an optical in-memory compute engine. It introduces an architecture that uses WDM hyperspectral encoding, cross-coupled microring resonator bitcells, and comb-based modulation to enable ultra-fast, low-energy operations, and maps the CPD primitives CP1–CP3 to the pSRAM for MTTKRP. A predictive performance model demonstrates sustained performance up to 17 PetaOps with 8-bit precision in a practical 52-channel, 20 GHz configuration, highlighting the potential of optical memory-compute co-design to overcome memory-bandwidth limitations. The findings suggest significant practical impact for accelerating data-intensive tasks such as tensor decomposition in domains like ML, signal processing, and bioinformatics, by reducing data movement and enabling high-throughput, scalable photonic processing.

Abstract

Photonics-based in-memory computing systems have demonstrated a significant speedup over traditional transistor-based systems because of their ultra-fast operating frequencies and high data bandwidths. Photonic static random access memory (pSRAM) is a crucial component for achieving the objective of ultra-fast photonic in-memory computing systems. In this work, we model and evaluate the performance of a novel photonic SRAM array architecture in development. Additionally, we examine hyperspectral operation through wavelength division multiplexing (WDM) to enhance the throughput of the pSRAM array. We map Matricized Tensor Times Khatri-Rao Product (MTTKRP), a computational kernel commonly used in tensor decomposition, to the proposed pSRAM array architecture. We also develop a predictive performance model to estimate the sustained performance of different configurations of the pSRAM array. Using the predictive performance model, we demonstrate that the pSRAM array achieves 17 PetaOps while performing MTTKRP in a practical hardware configuration.

Paper Structure

This paper contains 16 sections, 1 equation, 5 figures, 1 algorithm.

Figures (5)

  • Figure 1: (i) Schematic of the proposed computing engine. Optical frequency combs (OFC) are used to generate precise wavelength channels, which are then modulated using high-speed comb-shapers. The input, encoded across multiple independent wavelength channels, is sent into the word-line and multiplied with a memory bit. Different ring modulators (G/B/R/Y) are employed to handle different sets of wavelengths, with the resonances of other three resonators spaced within the FSR of the one. An analog output is received on the bit-line for further processing. (ii) The drop port transmission characteristics of the compute ring modulators indicates the spacing of wavelength channels used for WDM.
  • Figure 2: Grid representation of pSRAM array.
  • Figure 3: Mapping CP 1 to pSRAM array.
  • Figure 4: Mapping CP2 and CP3 to pSRAM array.
  • Figure 5: (i) Impact of wavelength channels. (ii) Impact of operating frequency.