Table of Contents
Fetching ...

Lightening-Transformer: A Dynamically-operated Optically-interconnected Photonic Transformer Accelerator

Hanqing Zhu, Jiaqi Gu, Hanrui Wang, Zixuan Jiang, Zhekai Zhang, Rongxing Tang, Chenghao Feng, Song Han, Ray T. Chen, David Z. Pan

TL;DR

Lightening-Transformer introduces a dynamically-operated photonic Transformer accelerator that overcomes core limitations of prior photonic designs by using a coherent, full-range dot-product unit (DDot) and a crossbar photonic tensor core (DPTC) to perform dynamic MM in one shot. The architecture leverages spectral (WDM) and spatial parallelism, optical broadcast interconnects, and analog-domain temporal accumulation to minimize encoding and A/D conversion costs, achieving substantial energy and latency advantages over both photonic and electronic baselines while preserving digital-comparable accuracy. Key contributions include the DDot encoder, the DPTC crossbar, architecture-level optimizations (inter-core broadcast and time-integral accumulation), and a detailed evaluation demonstrating 2.6× energy and 12× latency improvements versus prior photonic accelerators and 2–3 orders of magnitude lower energy-delay product versus digital systems. The work highlights the strong potential of photonics for advanced ML workloads, including Transformer-backed LLMs, and provides a software-hardware artifact for reproducibility.

Abstract

The wide adoption and significant computing resource of attention-based transformers, e.g., Vision Transformers and large language models (LLM), have driven the demand for efficient hardware accelerators. There is a growing interest in exploring photonics as an alternative technology to digital electronics due to its high energy efficiency and ultra-fast processing speed. Photonic accelerators have shown promising results for CNNs, which mainly rely on weight-static linear operations. However, they encounter issues when efficiently supporting Transformer architectures, questioning the applicability of photonics to advanced ML tasks. The primary hurdle lies in their inefficiency in handling unique workloads in Transformers, i.e., dynamic and full-range tensor multiplication. In this work, we propose Lightening-Transformer, the first light-empowered, high-performance, and energy-efficient photonic Transformer accelerator. To overcome prior designs' fundamental limitations, we introduce a novel dynamically-operated photonic tensor core, DPTC, a crossbar array of interference-based optical vector dot-product engines supporting highly parallel, dynamic, and full-range matrix multiplication. Furthermore, we design a dedicated accelerator that integrates our novel photonic computing cores with photonic interconnects for inter-core data broadcast, fully unleashing the power of optics. Comprehensive evaluations show that ours achieves >2.6x energy and >12x latency reductions compared to prior photonic accelerators and delivers the lowest energy cost and 2 to 3 orders of magnitude lower energy-delay product compared to electronic Transformer accelerators, all while maintaining digital-comparable accuracy. Our work highlights the immense potential of photonics for advanced ML workloads, such as Transformer-backboned LLM. Our work is available at https://github.com/zhuhanqing/Lightening-Transformer.

Lightening-Transformer: A Dynamically-operated Optically-interconnected Photonic Transformer Accelerator

TL;DR

Lightening-Transformer introduces a dynamically-operated photonic Transformer accelerator that overcomes core limitations of prior photonic designs by using a coherent, full-range dot-product unit (DDot) and a crossbar photonic tensor core (DPTC) to perform dynamic MM in one shot. The architecture leverages spectral (WDM) and spatial parallelism, optical broadcast interconnects, and analog-domain temporal accumulation to minimize encoding and A/D conversion costs, achieving substantial energy and latency advantages over both photonic and electronic baselines while preserving digital-comparable accuracy. Key contributions include the DDot encoder, the DPTC crossbar, architecture-level optimizations (inter-core broadcast and time-integral accumulation), and a detailed evaluation demonstrating 2.6× energy and 12× latency improvements versus prior photonic accelerators and 2–3 orders of magnitude lower energy-delay product versus digital systems. The work highlights the strong potential of photonics for advanced ML workloads, including Transformer-backed LLMs, and provides a software-hardware artifact for reproducibility.

Abstract

The wide adoption and significant computing resource of attention-based transformers, e.g., Vision Transformers and large language models (LLM), have driven the demand for efficient hardware accelerators. There is a growing interest in exploring photonics as an alternative technology to digital electronics due to its high energy efficiency and ultra-fast processing speed. Photonic accelerators have shown promising results for CNNs, which mainly rely on weight-static linear operations. However, they encounter issues when efficiently supporting Transformer architectures, questioning the applicability of photonics to advanced ML tasks. The primary hurdle lies in their inefficiency in handling unique workloads in Transformers, i.e., dynamic and full-range tensor multiplication. In this work, we propose Lightening-Transformer, the first light-empowered, high-performance, and energy-efficient photonic Transformer accelerator. To overcome prior designs' fundamental limitations, we introduce a novel dynamically-operated photonic tensor core, DPTC, a crossbar array of interference-based optical vector dot-product engines supporting highly parallel, dynamic, and full-range matrix multiplication. Furthermore, we design a dedicated accelerator that integrates our novel photonic computing cores with photonic interconnects for inter-core data broadcast, fully unleashing the power of optics. Comprehensive evaluations show that ours achieves >2.6x energy and >12x latency reductions compared to prior photonic accelerators and delivers the lowest energy cost and 2 to 3 orders of magnitude lower energy-delay product compared to electronic Transformer accelerators, all while maintaining digital-comparable accuracy. Our work highlights the immense potential of photonics for advanced ML workloads, such as Transformer-backboned LLM. Our work is available at https://github.com/zhuhanqing/Lightening-Transformer.
Paper Structure (43 sections, 11 equations, 16 figures, 5 tables)

This paper contains 43 sections, 11 equations, 16 figures, 5 tables.

Figures (16)

  • Figure 1: (a), (b), (c) Prior weight-static photonic tensor core designs NP_NATURE2017_ShenNP_SciRep2017_TaitNP_Nature2021_Feldmann. (d) Our proposed dynamic photonic tensor core design without static weight constraints.
  • Figure 2: (a) The proposed DDot dot-product engine. Multi-wavelength signals propagate concurrently on the waveguide. (b) The proposed DPTC matrix-matrix multiplication unit with input WDM signals broadcasting.
  • Figure 3: Our design point is robust to non-ideal dispersion effects. Coupling coefficient $\kappa$ and phase shift $\phi$ are not sensitive to wavelength-dependent device responses (i.e., dispersion).
  • Figure 4: High-level architecture of the proposed Lightening-Transformer. It has a three-level memory hierarchy, multiple photonic analog computing tiles/cores, on-chip multi-wavelength light sources, and optical interconnects for data broadcast.
  • Figure 5: Tiling and spatial/temporal mapping for processing GEMM. $M_1$ is the weight matrix when processing the linear layer, which is loaded chunk by chunk off-chip.
  • ...and 11 more figures