Table of Contents
Fetching ...

SCATTER: Algorithm-Circuit Co-Sparse Photonic Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution

Ziang Yin, Nicholas Gangi, Meng Zhang, Jeff Zhang, Rena Huang, Jiaqi Gu

TL;DR

This work tackles the challenges of thermal robustness, electrical-optical conversion power, and limited reconfigurability in photonic AI accelerators. It introduces SCATTER, a dynamically reconfigurable photonic tensor core that enables algorithm-circuit co-sparsity through in-situ light redistribution, power gating, and cross-layer optimization, including a DST-based sparse training framework. Key contributions include phase-insensitive incoherent tensor cores, on-chip tunable light rerouting, TIA/ADC gating, and a hybrid eoDAC design, achieving substantial reductions in area and on-chip power, e.g., up to $511\times$ area reduction and $12.4\times$ power savings. The results demonstrate robust performance under thermal crosstalk and point to a generalizable design framework for scalable, energy-efficient photonic AI systems.

Abstract

Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads. However, limited reconfigurability, high electrical-optical conversion cost, and thermal sensitivity limit the deployment of current optical analog computing engines to support power-restricted, performance-sensitive AI workloads at scale. Sparsity provides a great opportunity for hardware-efficient AI accelerators. However, current dense photonic accelerators fail to fully exploit the power-saving potential of algorithmic sparsity. It requires sparsity-aware hardware specialization with a fundamental re-design of photonic tensor core topology and cross-layer device-circuit-architecture-algorithm co-optimization aware of hardware non-ideality and power bottleneck. To trim down the redundant power consumption while maximizing robustness to thermal variations, we propose SCATTER, a novel algorithm-circuit co-sparse photonic accelerator featuring dynamically reconfigurable signal path via thermal-tolerant, power-efficient in-situ light redistribution and power gating. A power-optimized, crosstalk-aware dynamic sparse training framework is introduced to explore row-column structured sparsity and ensure marginal accuracy loss and maximum power efficiency. The extensive evaluation shows that our cross-stacked optimized accelerator SCATTER achieves a 511X area reduction and 12.4X power saving with superior crosstalk tolerance that enables unprecedented circuit layout compactness and on-chip power efficiency.

SCATTER: Algorithm-Circuit Co-Sparse Photonic Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution

TL;DR

This work tackles the challenges of thermal robustness, electrical-optical conversion power, and limited reconfigurability in photonic AI accelerators. It introduces SCATTER, a dynamically reconfigurable photonic tensor core that enables algorithm-circuit co-sparsity through in-situ light redistribution, power gating, and cross-layer optimization, including a DST-based sparse training framework. Key contributions include phase-insensitive incoherent tensor cores, on-chip tunable light rerouting, TIA/ADC gating, and a hybrid eoDAC design, achieving substantial reductions in area and on-chip power, e.g., up to area reduction and power savings. The results demonstrate robust performance under thermal crosstalk and point to a generalizable design framework for scalable, energy-efficient photonic AI systems.

Abstract

Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads. However, limited reconfigurability, high electrical-optical conversion cost, and thermal sensitivity limit the deployment of current optical analog computing engines to support power-restricted, performance-sensitive AI workloads at scale. Sparsity provides a great opportunity for hardware-efficient AI accelerators. However, current dense photonic accelerators fail to fully exploit the power-saving potential of algorithmic sparsity. It requires sparsity-aware hardware specialization with a fundamental re-design of photonic tensor core topology and cross-layer device-circuit-architecture-algorithm co-optimization aware of hardware non-ideality and power bottleneck. To trim down the redundant power consumption while maximizing robustness to thermal variations, we propose SCATTER, a novel algorithm-circuit co-sparse photonic accelerator featuring dynamically reconfigurable signal path via thermal-tolerant, power-efficient in-situ light redistribution and power gating. A power-optimized, crosstalk-aware dynamic sparse training framework is introduced to explore row-column structured sparsity and ensure marginal accuracy loss and maximum power efficiency. The extensive evaluation shows that our cross-stacked optimized accelerator SCATTER achieves a 511X area reduction and 12.4X power saving with superior crosstalk tolerance that enables unprecedented circuit layout compactness and on-chip power efficiency.
Paper Structure (26 sections, 14 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 26 sections, 14 equations, 10 figures, 3 tables, 1 algorithm.

Figures (10)

  • Figure 1: Our proposed SCATTER architecture co-explores circuit/algorithm sparsity with power efficiency and robustness co-optimization compared to generic dense tensor cores.
  • Figure 2: Dynamic multi-core photonic accelerator architecture with $R$ tiles and $C$ PTCs per tile. Each PTC is of size $k_1 \times k_2$. Input modulation modules are shared by $r$ PTCs across different tiles. Readout circuitry is shared by $c$ PTCs in a tile.
  • Figure 3: Schematic of phase-agnostic incoherent PTC.
  • Figure 4: (a) Inter- and intra-MZI thermal crosstalk are modeled by distance-related coupling coefficients $\gamma$. (b) Lumerical HEAT simulation is used to sweep various phase shifter spacings and fit a numerical crosstalk model. (c) Larger arm spacing $l_s$ reduces the required MZI power to realize the same phase difference. (d) Larger MZI spacing $l_h$ reduces normalized mean-absolute error (N-MAE) on phases and weights. (e) Impact of arm spacing and MZI spacing on area, power, and crosstalk.
  • Figure 5: Weight block column-wise sparsity can be supported by on-chip light rerouter with in-situ tunable light splitting ratios. Here, we show an 8$\times$8 block as an example. Input gating helps save significant high-speed DAC and input modulation power while reducing leakage error in pruned paths. Light redistribution eliminates leakage errors and provides light power to unpruned computing engines with higher optical SNR. Different from the tree structure in the schematic, a folded rerouter layout is designed to save area. Refocusing can effectively reduce computing N-MAE errors compared to standard weight pruning.
  • ...and 5 more figures