Table of Contents
Fetching ...

HEANA: A Hybrid Time-Amplitude Analog Optical Accelerator with Flexible Dataflows for Energy-Efficient CNN Inference

Sairam Sri Vatsavai, Venkata Sai Praneeth Karempudi, Ishan Thakkar

TL;DR

HEANA introduces a hybrid time-amplitude analog optical accelerator for CNN inference that overcomes crosstalk, dataflow rigidity, and limited in-situ accumulation in prior MRR-based designs. By integrating spectrally hitless TAOMs with Balanced Photo-Charge Accumulators, HEANA enables flexible output/input/weight stationary dataflows and in-situ spatio-temporal accumulation, dramatically reducing psum buffers and external reduction networks. System-level evaluations across four CNNs show substantial throughput and energy-efficiency gains over prior incoherent DPUs, with minimal Top-1/Top-5 accuracy loss at 8-bit quantization. These results highlight HEANA’s potential to deliver scalable, energy-efficient photonic CNN acceleration with broad dataflow support.

Abstract

Several photonic microring resonators (MRRs) based analog accelerators have been proposed to accelerate the inference of integer-quantized CNNs with remarkably higher throughput and energy efficiency compared to their electronic counterparts. However, the existing analog photonic accelerators suffer from three shortcomings: (i) severe hampering of wavelength parallelism due to various crosstalk effects, (ii) inflexibility of supporting various dataflows other than the weight-stationary dataflow, and (iii) failure in fully leveraging the ability of photodetectors to perform in-situ accumulations. These shortcomings collectively hamper the performance and energy efficiency of prior accelerators. To tackle these shortcomings, we present a novel Hybrid timE Amplitude aNalog optical Accelerator, called HEANA. HEANA employs hybrid time-amplitude analog optical multipliers (TAOMs) that increase the flexibility of HEANA to support multiple dataflows. A spectrally hitless arrangement of TAOMs significantly reduces the crosstalk effects, thereby increasing the wavelength parallelism in HEANA. Moreover, HEANA employs our invented balanced photo-charge accumulators (BPCAs) that enable buffer-less, in-situ, temporal accumulations to eliminate the need to use reduction networks in HEANA, relieving it from related latency and energy overheads. Our evaluation for the inference of four modern CNNs indicates that HEANA provides improvements of atleast 66x and 84x in frames-per-second (FPS) and FPS/W (energy-efficiency), respectively, for equal-area comparisons, on gmean over two MRR-based analog CNN accelerators from prior work.

HEANA: A Hybrid Time-Amplitude Analog Optical Accelerator with Flexible Dataflows for Energy-Efficient CNN Inference

TL;DR

HEANA introduces a hybrid time-amplitude analog optical accelerator for CNN inference that overcomes crosstalk, dataflow rigidity, and limited in-situ accumulation in prior MRR-based designs. By integrating spectrally hitless TAOMs with Balanced Photo-Charge Accumulators, HEANA enables flexible output/input/weight stationary dataflows and in-situ spatio-temporal accumulation, dramatically reducing psum buffers and external reduction networks. System-level evaluations across four CNNs show substantial throughput and energy-efficiency gains over prior incoherent DPUs, with minimal Top-1/Top-5 accuracy loss at 8-bit quantization. These results highlight HEANA’s potential to deliver scalable, energy-efficient photonic CNN acceleration with broad dataflow support.

Abstract

Several photonic microring resonators (MRRs) based analog accelerators have been proposed to accelerate the inference of integer-quantized CNNs with remarkably higher throughput and energy efficiency compared to their electronic counterparts. However, the existing analog photonic accelerators suffer from three shortcomings: (i) severe hampering of wavelength parallelism due to various crosstalk effects, (ii) inflexibility of supporting various dataflows other than the weight-stationary dataflow, and (iii) failure in fully leveraging the ability of photodetectors to perform in-situ accumulations. These shortcomings collectively hamper the performance and energy efficiency of prior accelerators. To tackle these shortcomings, we present a novel Hybrid timE Amplitude aNalog optical Accelerator, called HEANA. HEANA employs hybrid time-amplitude analog optical multipliers (TAOMs) that increase the flexibility of HEANA to support multiple dataflows. A spectrally hitless arrangement of TAOMs significantly reduces the crosstalk effects, thereby increasing the wavelength parallelism in HEANA. Moreover, HEANA employs our invented balanced photo-charge accumulators (BPCAs) that enable buffer-less, in-situ, temporal accumulations to eliminate the need to use reduction networks in HEANA, relieving it from related latency and energy overheads. Our evaluation for the inference of four modern CNNs indicates that HEANA provides improvements of atleast 66x and 84x in frames-per-second (FPS) and FPS/W (energy-efficiency), respectively, for equal-area comparisons, on gmean over two MRR-based analog CNN accelerators from prior work.
Paper Structure (29 sections, 3 equations, 17 figures, 6 tables)

This paper contains 29 sections, 3 equations, 17 figures, 6 tables.

Figures (17)

  • Figure 1: Comparison of CNN dataflow schemes: (a) Output Stationary (b) Input Stationary (c) Weight Stationary. Table reports the buffer accesses required by DPU to process layer 5 of GoogleNetgooglenet.
  • Figure 2: Illustration of common analog optical DPU organizations.(a) AMW DPU (b) MAW DPU.
  • Figure 3: Schematic of the Dot Product Unit (DPU) of our HEANA accelerator.
  • Figure 4: (a) Structure of our microring modulator (MRM) based hybrid time-amplitude analog optical modulator (TAOM) connected to a balanced photocharge accumulator (BPCA) and (b) representation of analog signals (optical and electrical) at different stages of TAOM.
  • Figure 5: HEANA DPE consisting of two spectrally hitless TAOMs, connected to our BPCA circuit. The inset showcases analog representations of signals (both optical and electrical) at various stages of our DPE
  • ...and 12 more figures