Table of Contents
Fetching ...

SKYLIGHT: A Scalable Hundred-Channel 3D Photonic In-Memory Tensor Core Architecture for Real-time AI Inference

Meng Zhang, Ziang Yin, Nicholas Gangi, Alexander Chen, Brett Bamfo, Tianle Xu, Jiaqi Gu, Zhaoran Rena Huang

TL;DR

System-level evaluations on four representative machine learning tasks, including unsupervised local self-learning, demonstrate SKYLIGHT's robustness to realistic hardware non-idealities (low-bit quantization and signal-proportional analog noise capturing modulation, PCM programming, and readout variations).

Abstract

The growing computational demands of artificial intelligence (AI) are challenging conventional electronics, making photonic computing a promising alternative. However, existing photonic architectures face fundamental scalability and reliability barriers. This paper introduces SKYLIGHT, a scalable 3D photonic in-memory tensor core architecture designed for real-time AI inference. By co-designing its topology, wavelength routing, accumulation, and programming in a 3D stack, SKYLIGHT overcomes key limitations. Its innovations include a low-loss 3D Si/SiN crossbar topology, a thermally robust non-micro-ring resonator (MRR)-based wavelength-division multiplexing (WDM) component, a hierarchical signal accumulation using a multi-port photodetector (PD), and optically programmed non-volatile phase-change material (PCM) weights. Importantly, SKYLIGHT enables in-situ weight updates that support label-free, layer-local learning (e.g., forward-forward local updates) in addition to inference. Using SimPhony for system-level modeling, we show that a single 144 x 256 SKYLIGHT core is feasible within a single reticle and delivers 342.1 TOPS at 23.7 TOPS/W, enabling ResNet-50 inference at 1212 FPS with 27 mJ per image, and achieves 84.17 FPS/W end-to-end (1.61 x higher than an NVIDIA RTX PRO 6000 Blackwell GPU) under the same workload in real-time measurements. System-level evaluations on four representative machine learning tasks, including unsupervised local self-learning, demonstrate SKYLIGHT's robustness to realistic hardware non-idealities (low-bit quantization and signal-proportional analog noise capturing modulation, PCM programming, and readout variations). With noise-aware training, SKYLIGHT maintains high task accuracy, validating its potential as a comprehensive solution for energy-efficient, large-scale photonic AI accelerators.

SKYLIGHT: A Scalable Hundred-Channel 3D Photonic In-Memory Tensor Core Architecture for Real-time AI Inference

TL;DR

System-level evaluations on four representative machine learning tasks, including unsupervised local self-learning, demonstrate SKYLIGHT's robustness to realistic hardware non-idealities (low-bit quantization and signal-proportional analog noise capturing modulation, PCM programming, and readout variations).

Abstract

The growing computational demands of artificial intelligence (AI) are challenging conventional electronics, making photonic computing a promising alternative. However, existing photonic architectures face fundamental scalability and reliability barriers. This paper introduces SKYLIGHT, a scalable 3D photonic in-memory tensor core architecture designed for real-time AI inference. By co-designing its topology, wavelength routing, accumulation, and programming in a 3D stack, SKYLIGHT overcomes key limitations. Its innovations include a low-loss 3D Si/SiN crossbar topology, a thermally robust non-micro-ring resonator (MRR)-based wavelength-division multiplexing (WDM) component, a hierarchical signal accumulation using a multi-port photodetector (PD), and optically programmed non-volatile phase-change material (PCM) weights. Importantly, SKYLIGHT enables in-situ weight updates that support label-free, layer-local learning (e.g., forward-forward local updates) in addition to inference. Using SimPhony for system-level modeling, we show that a single 144 x 256 SKYLIGHT core is feasible within a single reticle and delivers 342.1 TOPS at 23.7 TOPS/W, enabling ResNet-50 inference at 1212 FPS with 27 mJ per image, and achieves 84.17 FPS/W end-to-end (1.61 x higher than an NVIDIA RTX PRO 6000 Blackwell GPU) under the same workload in real-time measurements. System-level evaluations on four representative machine learning tasks, including unsupervised local self-learning, demonstrate SKYLIGHT's robustness to realistic hardware non-idealities (low-bit quantization and signal-proportional analog noise capturing modulation, PCM programming, and readout variations). With noise-aware training, SKYLIGHT maintains high task accuracy, validating its potential as a comprehensive solution for energy-efficient, large-scale photonic AI accelerators.
Paper Structure (23 sections, 9 equations, 12 figures, 5 tables)

This paper contains 23 sections, 9 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: (a) A schematic of SKYLIGHT tensor core architecture; (b) Illustration of 3D fanout network, hierarchical accumulation; and (c) A schematic of a multi-port Ge PD.
  • Figure 2: (a) Schematic of a Bragg grating-assisted wavelength selective coupler on a SOI substrate utilizing contra-directional coupling to add optical signal from port 4 to bus waveguide. (b) Transmission spectrum of three cascaded WSCs at Port 2 when Port 4 works as the input port. $\lambda_0$ is the original wavelength carried by the bus waveguide, and $\lambda_1 \sim \lambda_3$ are the wavelengths being coupled to the bus waveguide by three WSCs, respectively.
  • Figure 3: Heterogeneous integration of VCSEL arrays above a SiN/Si photonic computing substrate. Vertical optical coupling is enabled near the PIC, whereas electrical connections are routed through the backend metal stack to the RF front-end EIC (Tx/Rx).
  • Figure 4: Schematic of the vertically programmed integrated phase-change photonic memory cell. A silicon waveguide carries the signal light at $\lambda = 1550\ \mathrm{nm}$ from the optical input to the optical output, while a vertically incident VCSEL programming beam at $\lambda = 1064\ \mathrm{nm}$ is coupled from above to switch the phase-change material. The N-GST memory element is positioned at the waveguide intersection with dimensions of $2.5\ \mu\mathrm{m} \times 3\ \mu\mathrm{m}$, enabling localized optical modulation through vertical optical programming and in-plane signal transmission.
  • Figure 5: (a) Schematic of the vertical coupling configuration. (b) Inverse-designed SiN etch pattern. (c) Inverse-designed Si etch pattern. (d) Electric-field intensity distribution in the x-y plane at $z = 0.90\ \mu\text{m}$ in the middle of the SiN layer. (e) Electric-field intensity distribution in the x-y plane at $z = 0.58\ \mu\text{m}$ in the middle of the Si layer. (f) Electric-field intensity in the x–z plane at $y = 0\ \mu\text{m}$.
  • ...and 7 more figures