Table of Contents
Fetching ...

Performance Implications of Multi-Chiplet Neural Processing Units on Autonomous Driving Perception

Mohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque

TL;DR

This work proposes a novel scheduling strategy to efficiently deploy perception workloads on multi-chip AI accelerators and shows how this approach realizes 82% and 2.8 × increase in throughput and processing engines utilization compared to monolithic accelerator designs.

Abstract

We study the application of emerging chiplet-based Neural Processing Units to accelerate vehicular AI perception workloads in constrained automotive settings. The motivation stems from how chiplets technology is becoming integral to emerging vehicular architectures, providing a cost-effective trade-off between performance, modularity, and customization; and from perception models being the most computationally demanding workloads in a autonomous driving system. Using the Tesla Autopilot perception pipeline as a case study, we first breakdown its constituent models and profile their performance on different chiplet accelerators. From the insights, we propose a novel scheduling strategy to efficiently deploy perception workloads on multi-chip AI accelerators. Our experiments using a standard DNN performance simulator, MAESTRO, show our approach realizes 82% and 2.8x increase in throughput and processing engines utilization compared to monolithic accelerator designs.

Performance Implications of Multi-Chiplet Neural Processing Units on Autonomous Driving Perception

TL;DR

This work proposes a novel scheduling strategy to efficiently deploy perception workloads on multi-chip AI accelerators and shows how this approach realizes 82% and 2.8 × increase in throughput and processing engines utilization compared to monolithic accelerator designs.

Abstract

We study the application of emerging chiplet-based Neural Processing Units to accelerate vehicular AI perception workloads in constrained automotive settings. The motivation stems from how chiplets technology is becoming integral to emerging vehicular architectures, providing a cost-effective trade-off between performance, modularity, and customization; and from perception models being the most computationally demanding workloads in a autonomous driving system. Using the Tesla Autopilot perception pipeline as a case study, we first breakdown its constituent models and profile their performance on different chiplet accelerators. From the insights, we propose a novel scheduling strategy to efficiently deploy perception workloads on multi-chip AI accelerators. Our experiments using a standard DNN performance simulator, MAESTRO, show our approach realizes 82% and 2.8x increase in throughput and processing engines utilization compared to monolithic accelerator designs.

Paper Structure

This paper contains 21 sections, 1 equation, 12 figures, 3 tables.

Figures (12)

  • Figure 1: A descriptive schematic showing this work's scope in adopting accelerator MCMs as NPUs in self-driving platforms (e.g., Tesla FSD).
  • Figure 2: The four-stage perception pipeline based on the HydraNet architecture mullapudi2018hydranets by Tesla Autopilot system autonomous2021how whose feature dims and models are displayed.
  • Figure 3: Breakdown of latency (top) and energy consumption (bottom) per perception component across Shidiannao- (left) and NVDLA-like (right) accelerators using MAESTRO for a single 256 accelerator chiplet.
  • Figure 4: Affinities of the feature extractors (top), spatio-temporal attention fusion (mid), and trunks (bot) towards Shidiannao- and NVDLA-like accelerators. $\Delta$Value$<$0 implies Shidiannao-like affinity and the opposite for NVDLA-like.
  • Figure 5: The 8 FE+BFPN models mapping onto the MCM's first quadrant.
  • ...and 7 more figures