Table of Contents
Fetching ...

TeMPO: Efficient Time-Multiplexed Dynamic Photonic Tensor Core for Edge AI with Compact Slow-Light Electro-Optic Modulator

Meng Zhang, Dennis Yin, Nicholas Gangi, Amir Begović, Alexander Chen, Zhaoran Rena Huang, Jiaqi Gu

TL;DR

TeMPO addresses the energy and bandwidth challenges of edge AI by delivering a time-multiplexed dynamic photonic tensor core built on cross-layer device, circuit, and architecture customization. It combines a compact slow-light MZM input encoder, tailored optical splitters and phase shifters, hierarchical analog partial-product accumulation via capacitive temporal integration, and a multi-core tiled layout to maximize sharing and minimize ADC overhead. The approach achieves 368.6 TOPS, 22.3 TOPS/W, and 1.2 TOPS/mm$^2$ compute density on representative edge workloads with 6-bit quantization, while maintaining robustness to hardware noise. This work demonstrates a practical pathway to Pareto-frontier electronic-photonic accelerators for real-time, energy-efficient edge inference through deliberate cross-layer design.

Abstract

Electronic-photonic computing systems offer immense potential in energy-efficient artificial intelligence (AI) acceleration tasks due to the superior computing speed and efficiency of optics, especially for real-time, low-energy deep neural network (DNN) inference tasks on resource-restricted edge platforms. However, current optical neural accelerators based on foundry-available devices and conventional system architecture still encounter a performance gap compared to highly customized electronic counterparts. To bridge the performance gap due to lack of domain specialization, we present a time-multiplexed dynamic photonic tensor accelerator, dubbed TeMPO, with cross-layer device/circuit/architecture customization. At the device level, we present foundry-compatible, customized photonic devices, including a slow-light electro-optic modulator with experimental demonstration, optical splitters, and phase shifters that significantly reduce the footprint and power in input encoding and dot-product calculation. At the circuit level, partial products are hierarchically accumulated via parallel photocurrent aggregation, lightweight capacitive temporal integration, and sequential digital summation, considerably relieving the analog-to-digital conversion bottleneck. We also employ a multi-tile, multi-core architecture to maximize hardware sharing for higher efficiency. Across diverse edge AI workloads, TeMPO delivers digital-comparable task accuracy with superior quantization/noise tolerance. We achieve a 368.6 TOPS peak performance, 22.3 TOPS/W energy efficiency, and 1.2 TOPS/mm$^2$ compute density, pushing the Pareto frontier in edge AI hardware. This work signifies the power of cross-layer co-design and domain-specific customization, paving the way for future electronic-photonic accelerators with even greater performance and efficiency.

TeMPO: Efficient Time-Multiplexed Dynamic Photonic Tensor Core for Edge AI with Compact Slow-Light Electro-Optic Modulator

TL;DR

TeMPO addresses the energy and bandwidth challenges of edge AI by delivering a time-multiplexed dynamic photonic tensor core built on cross-layer device, circuit, and architecture customization. It combines a compact slow-light MZM input encoder, tailored optical splitters and phase shifters, hierarchical analog partial-product accumulation via capacitive temporal integration, and a multi-core tiled layout to maximize sharing and minimize ADC overhead. The approach achieves 368.6 TOPS, 22.3 TOPS/W, and 1.2 TOPS/mm compute density on representative edge workloads with 6-bit quantization, while maintaining robustness to hardware noise. This work demonstrates a practical pathway to Pareto-frontier electronic-photonic accelerators for real-time, energy-efficient edge inference through deliberate cross-layer design.

Abstract

Electronic-photonic computing systems offer immense potential in energy-efficient artificial intelligence (AI) acceleration tasks due to the superior computing speed and efficiency of optics, especially for real-time, low-energy deep neural network (DNN) inference tasks on resource-restricted edge platforms. However, current optical neural accelerators based on foundry-available devices and conventional system architecture still encounter a performance gap compared to highly customized electronic counterparts. To bridge the performance gap due to lack of domain specialization, we present a time-multiplexed dynamic photonic tensor accelerator, dubbed TeMPO, with cross-layer device/circuit/architecture customization. At the device level, we present foundry-compatible, customized photonic devices, including a slow-light electro-optic modulator with experimental demonstration, optical splitters, and phase shifters that significantly reduce the footprint and power in input encoding and dot-product calculation. At the circuit level, partial products are hierarchically accumulated via parallel photocurrent aggregation, lightweight capacitive temporal integration, and sequential digital summation, considerably relieving the analog-to-digital conversion bottleneck. We also employ a multi-tile, multi-core architecture to maximize hardware sharing for higher efficiency. Across diverse edge AI workloads, TeMPO delivers digital-comparable task accuracy with superior quantization/noise tolerance. We achieve a 368.6 TOPS peak performance, 22.3 TOPS/W energy efficiency, and 1.2 TOPS/mm compute density, pushing the Pareto frontier in edge AI hardware. This work signifies the power of cross-layer co-design and domain-specific customization, paving the way for future electronic-photonic accelerators with even greater performance and efficiency.
Paper Structure (24 sections, 18 equations, 19 figures, 3 tables)

This paper contains 24 sections, 18 equations, 19 figures, 3 tables.

Figures (19)

  • Figure 1: Our versatile, reconfigurable, cross-stack customized photonic accelerator TeMPO achieves digital-comparable accuracy with 22.3 TOPS/W efficiency on edge AI.
  • Figure 2: Schematic of a dynamic optical dot-product engine.
  • Figure 3: Our designed multi-core time-multiplexed dynamic photonic tensor accelerator TeMPO. ➊-➌ correspond to the hierarchical partial product accumulation in Eq. \ref{['eq:PartialProduct']}. All $R$ PTCs in a column share the same $Y$ matrix MZMs. All $C$ PTCs in a row share the same readout circuitry.
  • Figure 4: Schematic of our proposed time-multiplexed double-layer-splitter tensor core TeMPO-D. $K=3$ is sketched here as an example for illustration.
  • Figure 5: Schematic of our proposed time-multiplexed embedded-uneven-splitter tensor core TeMPO-E. $K=3$ is sketched here as an example for illustration.
  • ...and 14 more figures