COMPASS: A Compiler Framework for Resource-Constrained Crossbar-Array Based In-Memory Deep Learning Accelerators

Jihoon Park; Jeongin Choe; Dohyun Kim; Jae-Joon Kim

COMPASS: A Compiler Framework for Resource-Constrained Crossbar-Array Based In-Memory Deep Learning Accelerators

Jihoon Park, Jeongin Choe, Dohyun Kim, Jae-Joon Kim

TL;DR

COMPASS addresses the challenge of running large DNNs on resource-constrained crossbar-based PIM accelerators by partitioning networks into on-chip partitions and using a genetic algorithm to optimize partition groups with weight replacement between partitions. It introduces a partitioning-based compiler framework with validity maps, memory-access-aware scheduling, and four mutation operators, and demonstrates up to $1.78\times$ throughput improvement and up to $1.28\times$ energy-delay product savings over baselines across CNNs such as VGG16, ResNet18, and SqueezeNet on SRAM-based PIM prototypes. The framework supports weight reloads from external memory and optimizes for data dependencies, core utilization, and write-amortization to maximize performance under tight on-chip memory budgets. This work represents the first compiler approach that explicitly accounts for external memory communication in analog in-memory computing, enabling larger networks to be deployed on PIM hardware and applicable to future non-volatile memory-based crossbars.

Abstract

Recently, crossbar array based in-memory accelerators have been gaining interest due to their high throughput and energy efficiency. While software and compiler support for the in-memory accelerators has also been introduced, they are currently limited to the case where all weights are assumed to be on-chip. This limitation becomes apparent with the significantly increasing network sizes compared to the in-memory footprint. Weight replacement schemes are essential to address this issue. We propose COMPASS, a compiler framework for resource-constrained crossbar-based processing-in-memory (PIM) deep neural network (DNN) accelerators. COMPASS is specially targeted for networks that exceed the capacity of PIM crossbar arrays, necessitating access to external memories. We propose an algorithm to determine the optimal partitioning that divides the layers so that each partition can be accelerated on chip. Our scheme takes into account the data dependence between layers, core utilization, and the number of write instructions to minimize latency, memory accesses, and improve energy efficiency. Simulation results demonstrate that COMPASS can accommodate much more networks using a minimal memory footprint, while improving throughput by 1.78X and providing 1.28X savings in energy-delay product (EDP) over baseline partitioning methods.

COMPASS: A Compiler Framework for Resource-Constrained Crossbar-Array Based In-Memory Deep Learning Accelerators

TL;DR

throughput improvement and up to

energy-delay product savings over baselines across CNNs such as VGG16, ResNet18, and SqueezeNet on SRAM-based PIM prototypes. The framework supports weight reloads from external memory and optimizes for data dependencies, core utilization, and write-amortization to maximize performance under tight on-chip memory budgets. This work represents the first compiler approach that explicitly accounts for external memory communication in analog in-memory computing, enabling larger networks to be deployed on PIM hardware and applicable to future non-volatile memory-based crossbars.

Abstract

Paper Structure (28 sections, 1 equation, 10 figures, 2 tables, 1 algorithm)

This paper contains 28 sections, 1 equation, 10 figures, 2 tables, 1 algorithm.

Introduction
PIM accelerator with model partitioning
Weight Replacement
Partitioned Model Execution
compiler Framework
Overview of COMPASS Framework
Partition Generation
Validity Map
Non-crossbar-mapped Layers
Memory Access Management
COMPASS Algorithm
Partition Group Fitness
Partition Score and Selection
Mutation
Evaluation
...and 13 more sections

Figures (10)

Figure 1: In-memory DNN accelerator architecture with weight replacement
Figure 2: Partitioned model execution. At T0, the first partition runs with weights loaded into PIM memory, inputs processed, and outputs stored in the global memory. The stored output becomes the input for the next partition at T1.
Figure 3: COMPASS compiler framework overview
Figure 4: Model decomposition and partition generation
Figure 5: Partition validity map. Chip-S and Chip-L represent the small and large chip configurations, as detailed in Table \ref{['tab:hw_config']}. The models increase in size from SqueezeNet to VGG16, with details provided in Table \ref{['tab:network']}.
...and 5 more figures

COMPASS: A Compiler Framework for Resource-Constrained Crossbar-Array Based In-Memory Deep Learning Accelerators

TL;DR

Abstract

COMPASS: A Compiler Framework for Resource-Constrained Crossbar-Array Based In-Memory Deep Learning Accelerators

Authors

TL;DR

Abstract

Table of Contents

Figures (10)