Table of Contents
Fetching ...

HARP: A Taxonomy for Heterogeneous and Hierarchical Processors for Mixed-reuse Workloads

Raveesh Garg, Michael Pellauer, Tushar Krishna

TL;DR

The paper tackles the challenge of mixed-reuse AI workloads by introducing HARP, a taxonomy that classifies hierarchical and heterogeneous processors (HHPs) along axes of compute location and heterogeneity. It pairs the taxonomy with a Timeloop-based evaluation framework (including a modified cost model) to study how resource partitioning and mapping affect performance and energy for transformer workloads. Key contributions include the Harp taxonomy, a framework for blackbox mapping of sub-accelerators, and a detailed empirical study showing when heterogeneous and hierarchical designs outperform homogeneous ones (notably in decoder-only models) and how bandwidth partitioning influences outcomes. The work provides a structured design space and actionable insights for building energy-efficient accelerators that can hide low-reuse operations behind high-reuse computations in mixed-reuse AI workloads.

Abstract

Artificial intelligence (AI) application domains consist of a mix of tensor operations with high and low arithmetic intensities (aka reuse). Hierarchical (i.e. compute along multiple levels of memory hierarchy) and heterogeneous (multiple different sub-accelerators) accelerators are emerging as a popular way to process mixed reuse workloads, and workloads which consist of tensor operators with diverse shapes. However, the space of hierarchical and/or heterogeneous processors (HHP's) is relatively under-explored. Prior works have proposed custom architectures to take advantage of heterogeneity to have multiple sub-accelerators that are efficient for different operator shapes. In this work, we propose HARP, a taxonomy to classify various hierarchical and heterogeneous accelerators and use the it to study the impact of heterogeneity at various levels in the architecture. HARP taxonomy captures various ways in which HHP's can be conceived, ranging from B100 cores with an "intra-node heterogeneity" between SM and tensor core to NeuPIM with cross-depth heterogeneity which occurs at different levels of memory hierarchy. We use Timeloop mapper to find the best mapping for sub-accelerators and also modify the Timeloop cost model to extend it to model hierarchical and heterogeneous accelerators.

HARP: A Taxonomy for Heterogeneous and Hierarchical Processors for Mixed-reuse Workloads

TL;DR

The paper tackles the challenge of mixed-reuse AI workloads by introducing HARP, a taxonomy that classifies hierarchical and heterogeneous processors (HHPs) along axes of compute location and heterogeneity. It pairs the taxonomy with a Timeloop-based evaluation framework (including a modified cost model) to study how resource partitioning and mapping affect performance and energy for transformer workloads. Key contributions include the Harp taxonomy, a framework for blackbox mapping of sub-accelerators, and a detailed empirical study showing when heterogeneous and hierarchical designs outperform homogeneous ones (notably in decoder-only models) and how bandwidth partitioning influences outcomes. The work provides a structured design space and actionable insights for building energy-efficient accelerators that can hide low-reuse operations behind high-reuse computations in mixed-reuse AI workloads.

Abstract

Artificial intelligence (AI) application domains consist of a mix of tensor operations with high and low arithmetic intensities (aka reuse). Hierarchical (i.e. compute along multiple levels of memory hierarchy) and heterogeneous (multiple different sub-accelerators) accelerators are emerging as a popular way to process mixed reuse workloads, and workloads which consist of tensor operators with diverse shapes. However, the space of hierarchical and/or heterogeneous processors (HHP's) is relatively under-explored. Prior works have proposed custom architectures to take advantage of heterogeneity to have multiple sub-accelerators that are efficient for different operator shapes. In this work, we propose HARP, a taxonomy to classify various hierarchical and heterogeneous accelerators and use the it to study the impact of heterogeneity at various levels in the architecture. HARP taxonomy captures various ways in which HHP's can be conceived, ranging from B100 cores with an "intra-node heterogeneity" between SM and tensor core to NeuPIM with cross-depth heterogeneity which occurs at different levels of memory hierarchy. We use Timeloop mapper to find the best mapping for sub-accelerators and also modify the Timeloop cost model to extend it to model hierarchical and heterogeneous accelerators.

Paper Structure

This paper contains 30 sections, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Roofline in heterogeneous accelerator with high- and low-reuse sub-acceelrators compared to homogeneous accelerator with total area and memory bandwidth.
  • Figure 2: Partitioning of LLB resources.
  • Figure 3: (a) Intra-cascade partitioning within the encoder model. (b) Inter-cascade partitioning of the decoder model into prefill and decode phases with the prefill phase mapped on high-reuse sub-accelerator and decode phase mapped on low-reuse sub-accelerator.
  • Figure 4: Various examples hierarchical and/or heterogeneous processors described by Harp taxonomy. The square and chevron shapes represent different sub-accelerator architectures, while C represents the FSM controller tied to the sub-accelerators.
  • Figure 5: Evaluation framework built on Timeloop timeloop.
  • ...and 5 more figures