Table of Contents
Fetching ...

Scope: A Scalable Merged Pipeline Framework for Multi-Chip-Module NN Accelerators

Zongle Huang, Hongyang Jia, Kaiwei Zou, Yongpan Liu

TL;DR

The proposed Scope, a merged pipeline framework incorporating this overlooked multi-layer dimension, thereby achieving improved throughput and scalability by relaxing tradeoffs between computation, communication and memory costs, is proposed.

Abstract

Neural network (NN) accelerators with multi-chip-module (MCM) architectures enable integration of massive computation capability; however, they face challenges of computing resource underutilization and off-chip communication overheads. Traditional parallelization schemes for NN inference on MCM architectures, such as intra-layer parallelism and inter-layer pipelining, show incompetency in breaking through both challenges, limiting the scalability of MCM architectures. We observed that existing works typically deploy layers separately rather than considering them jointly. This underexploited dimension leads to compromises between system computation and communication, thus hindering optimal utilization, especially as hardware/software scale. To address this limitation, we propose Scope, a merged pipeline framework incorporating this overlooked multi-layer dimension, thereby achieving improved throughput and scalability by relaxing tradeoffs between computation, communication and memory costs. This new dimension, however, adds to the complexity of design space exploration (DSE). To tackle this, we develop a series of search algorithms that achieves exponential-to-linear complexity reduction, while identifying solutions that rank in the top 0.05% of performance. Experiments show that Scope achieves up to 1.73x throughput improvement while maintaining similar energy consumption for ResNet-152 inference compared to state-of-the-art approaches.

Scope: A Scalable Merged Pipeline Framework for Multi-Chip-Module NN Accelerators

TL;DR

The proposed Scope, a merged pipeline framework incorporating this overlooked multi-layer dimension, thereby achieving improved throughput and scalability by relaxing tradeoffs between computation, communication and memory costs, is proposed.

Abstract

Neural network (NN) accelerators with multi-chip-module (MCM) architectures enable integration of massive computation capability; however, they face challenges of computing resource underutilization and off-chip communication overheads. Traditional parallelization schemes for NN inference on MCM architectures, such as intra-layer parallelism and inter-layer pipelining, show incompetency in breaking through both challenges, limiting the scalability of MCM architectures. We observed that existing works typically deploy layers separately rather than considering them jointly. This underexploited dimension leads to compromises between system computation and communication, thus hindering optimal utilization, especially as hardware/software scale. To address this limitation, we propose Scope, a merged pipeline framework incorporating this overlooked multi-layer dimension, thereby achieving improved throughput and scalability by relaxing tradeoffs between computation, communication and memory costs. This new dimension, however, adds to the complexity of design space exploration (DSE). To tackle this, we develop a series of search algorithms that achieves exponential-to-linear complexity reduction, while identifying solutions that rank in the top 0.05% of performance. Experiments show that Scope achieves up to 1.73x throughput improvement while maintaining similar energy consumption for ResNet-152 inference compared to state-of-the-art approaches.
Paper Structure (17 sections, 9 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 17 sections, 9 equations, 10 figures, 3 tables, 1 algorithm.

Figures (10)

  • Figure 1: (a) The execution of segmented pipeline. (b) The tradeoff of segmented pipeline. More segments indicate fewer layers are deployed in each sequential deployment, increasing communication overhead, while fewer segments create deeper pipelines, leading to more bubbles and harder stage matching.
  • Figure 2: Comparison between segmented pipeline and Scope. We open up the new cluster dimension, thus enabling multi-to-multi layer-to-chiplet mapping.
  • Figure 3: (a) The overview of the MCM structure. (b) The micro-architecture of a chiplet, comtaining hierarchical memory and computing units.
  • Figure 4: Typical intra-layer partitioning schemes on four chiplets.
  • Figure 5: Scope's decomposed execution and timeline within a segment, which is composed of pipelined clusters. Each cluster can be further broken down into layer-wise execution. We overlap computing phase and communication phase to reduce processing time.
  • ...and 5 more figures