H3PIMAP: A Heterogeneity-Aware Multi-Objective DNN Mapping Framework on Electronic-Photonic Processing-in-Memory Architectures
Ziang Yin, Aashish Poonia, Ashish Reddy Bommana, Xinyu Zhao, Zahra Hojati, Tianlong Chen, Krishnendu Chakrabarty, Farshad Firouzi, Jeff Zhang, Jiaqi Gu
TL;DR
The paper tackles the memory movement bottleneck in AI by proposing H3PIMAP, a heterogeneity-aware, two-stage mapping framework that jointly optimizes latency, energy, and accuracy across a four-tier Electronic-Photonic-PIM accelerator. It introduces a four-tier 3D-stacked architecture combining ReRAM/PIM, SRAM/PIM, photonics, and a global buffer, along with a dataflow and noise models that enable robust mapping. Through Stage 1 Pareto optimization and Stage 2 accuracy-driven Row Remap, H3PIMAP achieves substantial performance gains, including a representative 3.32$ imes$ latency reduction vs homogeneous mappings and up to 77.0$ imes$ latency reduction with 14.6$ imes$ energy savings on LLMs at matched quality. The framework demonstrates the practical viability and scalability of hybrid electronic-photonic-PIM accelerators for next-generation AI workloads, enabling high throughput, low movement cost, and resilience to hardware non-idealities.
Abstract
The future of artificial intelligence (AI) acceleration demands a paradigm shift beyond the limitations of purely electronic or photonic architectures. Photonic analog computing delivers unmatched speed and parallelism but struggles with data movement, robustness, and precision, while electronic processing-in-memory (PIM) enables energy-efficient computing by co-locating storage and computation but suffers from endurance and reconfiguration constraints, limiting it to static weight mapping. Neither approach alone achieves the balance needed for adaptive, efficient AI. To break this impasse, we study a hybrid electronic-photonic-PIM computing architecture and introduce H3PIMAP, a heterogeneity-aware mapping framework that seamlessly orchestrates workloads across electronic and optical tiers. By optimizing workload partitioning through a two-stage multi-objective exploration method, H3PIMAP harnesses light speed for high-throughput operations and PIM efficiency for memory-bound tasks. In system-level evaluations, H3PIMAP delivers a 3.32x latency reduction across language and vision models and, on large language models, achieves 77.0% lower latency with 14.6% lower energy at matched quality, outperforming homogeneous and naive mapping strategies. This proposed framework lays the foundation for hybrid AI accelerators, bridging the gap between electronic and photonic computation for next-generation efficiency and scalability.
