Table of Contents
Fetching ...

Multi-Objective Hardware-Mapping Co-Optimisation for Multi-DNN Workloads on Chiplet-based Accelerators

Abhijit Das, Enrico Russo, Maurizio Palesi

TL;DR

MOHaM tackles the challenge of jointly selecting sub-accelerator resources, configuring them, placing them in a NoP, and mapping layers of multiple DNNs onto chiplet-based accelerators. It introduces a multi-objective hardware-mapping co-optimisation framework that uses NSGA-II with custom operators to produce Pareto-optimal Multi-Accelerator Systems and schedules, evaluated with Timeloop+Accelergy cost models. The results show substantial latency and energy reductions (up to around 96%) and broad Pareto coverage when using heterogeneous chiplets, compared to state-of-the-art baselines, highlighting the value of co-optimising hardware and mapping for multi-DNN workloads. MOHaM demonstrates scalability and design-time applicability, with potential impact on next-generation multi-tenant DNN accelerators and ADAS/AR/VR hardware design, and suggests that faster cost models could further accelerate search.

Abstract

The need to efficiently execute different Deep Neural Networks (DNNs) on the same computing platform, coupled with the requirement for easy scalability, makes Multi-Chip Module (MCM)-based accelerators a preferred design choice. Such an accelerator brings together heterogeneous sub-accelerators in the form of chiplets, interconnected by a Network-on-Package (NoP). This paper addresses the challenge of selecting the most suitable sub-accelerators, configuring them, determining their optimal placement in the NoP, and mapping the layers of a predetermined set of DNNs spatially and temporally. The objective is to minimise execution time and energy consumption during parallel execution while also minimising the overall cost, specifically the silicon area, of the accelerator. This paper presents MOHaM, a framework for multi-objective hardware-mapping co-optimisation for multi-DNN workloads on chiplet-based accelerators. MOHaM exploits a multi-objective evolutionary algorithm that has been specialised for the given problem by incorporating several customised genetic operators. MOHaM is evaluated against state-of-the-art Design Space Exploration (DSE) frameworks on different multi-DNN workload scenarios. The solutions discovered by MOHaM are Pareto optimal compared to those by the state-of-the-art. Specifically, MOHaM-generated accelerator designs can reduce latency by up to $96\%$ and energy by up to $96.12\%$.

Multi-Objective Hardware-Mapping Co-Optimisation for Multi-DNN Workloads on Chiplet-based Accelerators

TL;DR

MOHaM tackles the challenge of jointly selecting sub-accelerator resources, configuring them, placing them in a NoP, and mapping layers of multiple DNNs onto chiplet-based accelerators. It introduces a multi-objective hardware-mapping co-optimisation framework that uses NSGA-II with custom operators to produce Pareto-optimal Multi-Accelerator Systems and schedules, evaluated with Timeloop+Accelergy cost models. The results show substantial latency and energy reductions (up to around 96%) and broad Pareto coverage when using heterogeneous chiplets, compared to state-of-the-art baselines, highlighting the value of co-optimising hardware and mapping for multi-DNN workloads. MOHaM demonstrates scalability and design-time applicability, with potential impact on next-generation multi-tenant DNN accelerators and ADAS/AR/VR hardware design, and suggests that faster cost models could further accelerate search.

Abstract

The need to efficiently execute different Deep Neural Networks (DNNs) on the same computing platform, coupled with the requirement for easy scalability, makes Multi-Chip Module (MCM)-based accelerators a preferred design choice. Such an accelerator brings together heterogeneous sub-accelerators in the form of chiplets, interconnected by a Network-on-Package (NoP). This paper addresses the challenge of selecting the most suitable sub-accelerators, configuring them, determining their optimal placement in the NoP, and mapping the layers of a predetermined set of DNNs spatially and temporally. The objective is to minimise execution time and energy consumption during parallel execution while also minimising the overall cost, specifically the silicon area, of the accelerator. This paper presents MOHaM, a framework for multi-objective hardware-mapping co-optimisation for multi-DNN workloads on chiplet-based accelerators. MOHaM exploits a multi-objective evolutionary algorithm that has been specialised for the given problem by incorporating several customised genetic operators. MOHaM is evaluated against state-of-the-art Design Space Exploration (DSE) frameworks on different multi-DNN workload scenarios. The solutions discovered by MOHaM are Pareto optimal compared to those by the state-of-the-art. Specifically, MOHaM-generated accelerator designs can reduce latency by up to and energy by up to .
Paper Structure (36 sections, 12 equations, 12 figures, 5 tables, 1 algorithm)

This paper contains 36 sections, 12 equations, 12 figures, 5 tables, 1 algorithm.

Figures (12)

  • Figure 1: Need for heterogeneity, co-optimisation and multi-objective exploration.
  • Figure 2: Inputs, output and internal organisation of the proposed MOHaM framework.
  • Figure 3: MOHaM global scheduler chromosome structure.
  • Figure 4: MOHaM-specific genetic operators.
  • Figure 5: Scheduling Gantt chart for latency, pie charts for area and energy breakdowns of a Pareto solution.
  • ...and 7 more figures