A Collaborative PIM Computing Optimization Framework for Multi-Tenant DNN
Bojing Li, Duo Zhong, Xiang Chen, Chenchen Liu
TL;DR
This paper addresses the challenge of efficiently deploying multi-tenant DNNs on ReRAM-based Processing-in-Memory (PIM) hardware by proposing a cross-level optimization framework that jointly tackles tenant-level hardware partitioning and fine-grained operator reconstruction. The approach combines a classic profiler to estimate inter-layer parallelism, intelligent area partitioning guided by a learning-rate-based iterative process, and Duplicator-Splitter driven reconstruction to form re-operators that fill on-chip resources with minimal waste. Empirical results show substantial speedups (up to 60.43x) and energy improvements (up to 1.89x) across varying chip topologies and network complexities, with different contributions from tenant- and operator-level optimizations depending on hardware scale. The work demonstrates a practical pathway to scalable, low-latency multi-tenant DNN deployment on ReRAM-based PIM designs, enabling more efficient utilization of on-chip resources in real-world AI workloads.
Abstract
Modern Artificial Intelligence (AI) applications are increasingly utilizing multi-tenant deep neural networks (DNNs), which lead to a significant rise in computing complexity and the need for computing parallelism. ReRAM-based processing-in-memory (PIM) computing, with its high density and low power consumption characteristics, holds promising potential for supporting the deployment of multi-tenant DNNs. However, direct deployment of complex multi-tenant DNNs on exsiting ReRAM-based PIM designs poses challenges. Resource contention among different tenants can result in sever under-utilization of on-chip computing resources. Moreover, area-intensive operators and computation-intensive operators require excessively large on-chip areas and long processing times, leading to high overall latency during parallel computing. To address these challenges, we propose a novel ReRAM-based in-memory computing framework that enables efficient deployment of multi-tenant DNNs on ReRAM-based PIM designs. Our approach tackles the resource contention problems by iteratively partitioning the PIM hardware at tenant level. In addition, we construct a fine-grained reconstructed processing pipeline at the operator level to handle area-intensive operators. Compared to the direct deployments on traditional ReRAM-based PIM designs, our proposed PIM computing framework achieves significant improvements in speed (ranges from 1.75x to 60.43x) and energy(up to 1.89x).
