Table of Contents
Fetching ...

H3PIMAP: A Heterogeneity-Aware Multi-Objective DNN Mapping Framework on Electronic-Photonic Processing-in-Memory Architectures

Ziang Yin, Aashish Poonia, Ashish Reddy Bommana, Xinyu Zhao, Zahra Hojati, Tianlong Chen, Krishnendu Chakrabarty, Farshad Firouzi, Jeff Zhang, Jiaqi Gu

TL;DR

The paper tackles the memory movement bottleneck in AI by proposing H3PIMAP, a heterogeneity-aware, two-stage mapping framework that jointly optimizes latency, energy, and accuracy across a four-tier Electronic-Photonic-PIM accelerator. It introduces a four-tier 3D-stacked architecture combining ReRAM/PIM, SRAM/PIM, photonics, and a global buffer, along with a dataflow and noise models that enable robust mapping. Through Stage 1 Pareto optimization and Stage 2 accuracy-driven Row Remap, H3PIMAP achieves substantial performance gains, including a representative 3.32$ imes$ latency reduction vs homogeneous mappings and up to 77.0$ imes$ latency reduction with 14.6$ imes$ energy savings on LLMs at matched quality. The framework demonstrates the practical viability and scalability of hybrid electronic-photonic-PIM accelerators for next-generation AI workloads, enabling high throughput, low movement cost, and resilience to hardware non-idealities.

Abstract

The future of artificial intelligence (AI) acceleration demands a paradigm shift beyond the limitations of purely electronic or photonic architectures. Photonic analog computing delivers unmatched speed and parallelism but struggles with data movement, robustness, and precision, while electronic processing-in-memory (PIM) enables energy-efficient computing by co-locating storage and computation but suffers from endurance and reconfiguration constraints, limiting it to static weight mapping. Neither approach alone achieves the balance needed for adaptive, efficient AI. To break this impasse, we study a hybrid electronic-photonic-PIM computing architecture and introduce H3PIMAP, a heterogeneity-aware mapping framework that seamlessly orchestrates workloads across electronic and optical tiers. By optimizing workload partitioning through a two-stage multi-objective exploration method, H3PIMAP harnesses light speed for high-throughput operations and PIM efficiency for memory-bound tasks. In system-level evaluations, H3PIMAP delivers a 3.32x latency reduction across language and vision models and, on large language models, achieves 77.0% lower latency with 14.6% lower energy at matched quality, outperforming homogeneous and naive mapping strategies. This proposed framework lays the foundation for hybrid AI accelerators, bridging the gap between electronic and photonic computation for next-generation efficiency and scalability.

H3PIMAP: A Heterogeneity-Aware Multi-Objective DNN Mapping Framework on Electronic-Photonic Processing-in-Memory Architectures

TL;DR

The paper tackles the memory movement bottleneck in AI by proposing H3PIMAP, a heterogeneity-aware, two-stage mapping framework that jointly optimizes latency, energy, and accuracy across a four-tier Electronic-Photonic-PIM accelerator. It introduces a four-tier 3D-stacked architecture combining ReRAM/PIM, SRAM/PIM, photonics, and a global buffer, along with a dataflow and noise models that enable robust mapping. Through Stage 1 Pareto optimization and Stage 2 accuracy-driven Row Remap, H3PIMAP achieves substantial performance gains, including a representative 3.32 latency reduction vs homogeneous mappings and up to 77.0 latency reduction with 14.6 energy savings on LLMs at matched quality. The framework demonstrates the practical viability and scalability of hybrid electronic-photonic-PIM accelerators for next-generation AI workloads, enabling high throughput, low movement cost, and resilience to hardware non-idealities.

Abstract

The future of artificial intelligence (AI) acceleration demands a paradigm shift beyond the limitations of purely electronic or photonic architectures. Photonic analog computing delivers unmatched speed and parallelism but struggles with data movement, robustness, and precision, while electronic processing-in-memory (PIM) enables energy-efficient computing by co-locating storage and computation but suffers from endurance and reconfiguration constraints, limiting it to static weight mapping. Neither approach alone achieves the balance needed for adaptive, efficient AI. To break this impasse, we study a hybrid electronic-photonic-PIM computing architecture and introduce H3PIMAP, a heterogeneity-aware mapping framework that seamlessly orchestrates workloads across electronic and optical tiers. By optimizing workload partitioning through a two-stage multi-objective exploration method, H3PIMAP harnesses light speed for high-throughput operations and PIM efficiency for memory-bound tasks. In system-level evaluations, H3PIMAP delivers a 3.32x latency reduction across language and vision models and, on large language models, achieves 77.0% lower latency with 14.6% lower energy at matched quality, outperforming homogeneous and naive mapping strategies. This proposed framework lays the foundation for hybrid AI accelerators, bridging the gap between electronic and photonic computation for next-generation efficiency and scalability.

Paper Structure

This paper contains 20 sections, 3 equations, 8 figures, 6 tables, 2 algorithms.

Figures (8)

  • Figure 1: 2D/3D heterogeneous electronic-photonic-PIM architecture with ReRAM, SRAM, and photonics.
  • Figure 2: Overview of our heterogeneous layer-to-hardware mapping flow. Stage 1 explores the Pareto-optimal mappings in the latency-energy space. Stage 2 adjusts mapping to trade efficiency for higher accuracy until the target accuracy is met.
  • Figure 3: Communication cost b/w 2 Conv2D layers w/ input size [8, 3, 32, 32] and [8, 16, 32, 32] in a 10$\times$10 PIM mesh.
  • Figure 4: Energy/latency improves during stage 1 search.
  • Figure 5: The Pythia-70M's layer-wise workload distribution and row-assignment among three devices. ➊ upper two figures show the workload distribution of H3PIMAP Pareto optimization (PO), and ➋ lower two shows that of H3PIMAP Pareto optimization (PO) + row remapping (RR).
  • ...and 3 more figures