Table of Contents
Fetching ...

ENLighten: Lighten the Transformer, Enable Efficient Optical Acceleration

Hanqing Zhu, Zhican Zhou, Shupeng Ning, Xuhao Wu, Ray Chen, Yating Wan, David Pan

TL;DR

ENLighten addresses the inefficiency of scaling Transformers on photonic hardware by tightly co‑designing software compression with hardware acceleration. Lighten produces a PTС‑aware, low‑rank plus structured sparse representation that preserves fidelity with minimal fine‑tuning, while ENLighten provides a reconfigurable sparse engine and broadband light redistribution to exploit the compressed structure. The approach yields up to 50% parameter reduction with ~1% accuracy loss after a few epochs and delivers about a 2.5× improvement in energy–delay product on a Base‑scale ViT, highlighting a path toward practical optics‑based acceleration for large AI models. Together, the work demonstrates a viable hardware–software co‑design route to scale photonic inference for state‑of‑the‑art Transformers with meaningful performance and energy benefits.

Abstract

Photonic computing has emerged as a promising substrate for accelerating the dense linear-algebra operations at the heart of AI, yet adoption for large Transformer models remains in its infancy. We identify two bottlenecks: (1) costly electro--optic conversions and data-movement overheads that erode energy efficiency as model sizes scale; (2) a mismatch between limited on-chip photonic resources and Transformer scale, which forces frequent reuse of photonic tensor cores and dilutes throughput gains. To address these challenges, we introduce a hardware--software co-design framework. First, we propose \texttt{Lighten}, a PTC-aware compression flow that post-hoc decomposes each Transformer weight matrix into a low-rank component plus a structured-sparse component aligned to photonic tensor-core granularity, without lengthy retraining. Second, we present \texttt{ENLighten}, a reconfigurable photonic accelerator with dynamically adaptive tensor cores, driven by broadband light redistribution, enabling fine-grained sparsity support and full power gating of inactive parts. On ImageNet, \texttt{Lighten} prunes a Base-scale Vision Transformer by 50\% with $\approx$1\% accuracy drop after only 3 epochs (about 1 hour) of fine-tuning. Deployed on \texttt{ENLighten}, it achieves a $2.5\times$ improvement in energy--delay product over the state-of-the-art photonic Transformer accelerator.

ENLighten: Lighten the Transformer, Enable Efficient Optical Acceleration

TL;DR

ENLighten addresses the inefficiency of scaling Transformers on photonic hardware by tightly co‑designing software compression with hardware acceleration. Lighten produces a PTС‑aware, low‑rank plus structured sparse representation that preserves fidelity with minimal fine‑tuning, while ENLighten provides a reconfigurable sparse engine and broadband light redistribution to exploit the compressed structure. The approach yields up to 50% parameter reduction with ~1% accuracy loss after a few epochs and delivers about a 2.5× improvement in energy–delay product on a Base‑scale ViT, highlighting a path toward practical optics‑based acceleration for large AI models. Together, the work demonstrates a viable hardware–software co‑design route to scale photonic inference for state‑of‑the‑art Transformers with meaningful performance and energy benefits.

Abstract

Photonic computing has emerged as a promising substrate for accelerating the dense linear-algebra operations at the heart of AI, yet adoption for large Transformer models remains in its infancy. We identify two bottlenecks: (1) costly electro--optic conversions and data-movement overheads that erode energy efficiency as model sizes scale; (2) a mismatch between limited on-chip photonic resources and Transformer scale, which forces frequent reuse of photonic tensor cores and dilutes throughput gains. To address these challenges, we introduce a hardware--software co-design framework. First, we propose \texttt{Lighten}, a PTC-aware compression flow that post-hoc decomposes each Transformer weight matrix into a low-rank component plus a structured-sparse component aligned to photonic tensor-core granularity, without lengthy retraining. Second, we present \texttt{ENLighten}, a reconfigurable photonic accelerator with dynamically adaptive tensor cores, driven by broadband light redistribution, enabling fine-grained sparsity support and full power gating of inactive parts. On ImageNet, \texttt{Lighten} prunes a Base-scale Vision Transformer by 50\% with 1\% accuracy drop after only 3 epochs (about 1 hour) of fine-tuning. Deployed on \texttt{ENLighten}, it achieves a improvement in energy--delay product over the state-of-the-art photonic Transformer accelerator.

Paper Structure

This paper contains 23 sections, 12 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: (left) State-of-the-art PTC design in photonic Transformer accelerator; (right) their diminishing benefits on speedup and energy efficiency moving from DeiT-Small to DeiT-Base; their energy cost is bottlenecked by data movements and weight digital-to-analog and modulation cost.
  • Figure 2: Overview of the Lighten flow to trim a big model to a slimmed one for energy efficiency and speedup on Photonic AI engines.
  • Figure 3: Starting from dense PTC, (a) unstructured element-wise sparsity provides no benefit due to dense computation. (b) Row pruning and (c) column pruning may not improve throughput and can lead to accuracy loss. (d) Our design balances energy savings, speedup, and performance.
  • Figure 4: (a) Broadband power redistribution unit using MMI for $\lambda$-tolerant splitting. (b) Transmission drift across phase settings and total channel wavelength span. (c) Mode profile verification of equal and full power splitting scenarios.
  • Figure 5: (a) Reconfigurable PTC with adaptive operating granularity; (b) Sparse engine with reconfigurable PTC (RPTC) and the corresponding input data fetcher to support our condensed sparse matrix; (c) Dense engine for uncompressed and low-rank factorized layers.
  • ...and 3 more figures