Polaris: Multi-Fidelity Design Space Exploration of Deep Learning Accelerators

Chirag Sakhuja; Charles Hong; Calvin Lin

Polaris: Multi-Fidelity Design Space Exploration of Deep Learning Accelerators

Chirag Sakhuja, Charles Hong, Calvin Lin

TL;DR

The paper tackles the high cost of exploring deep learning accelerator design spaces due to slow high-fidelity evaluations. It introduces Starlight, a transfer-learning, deep kernel learning predictor that matches RTL-level $EDP$ accuracy while remaining fast, and Polaris, a Bayesian-optimization–based DSE tool that uses Starlight in a multi-fidelity, RTL-in-loop setting. Key contributions include showing transfer learning reduces high-fidelity data needs by about $61\%$, achieving $EDP$ predictions with $99\%$ RTL accuracy, and delivering designs up to $2.7\times$ better in $EDP$ in around 35 minutes compared to six hours with prior methods. The approach enables rapid hardware/software co-design for DL accelerators and suggests broader applicability to other hardware design spaces where low- and high-fidelity evaluations are closely related.

Abstract

This paper presents a tool for automatically exploring the design space of deep learning accelerators (DLAs). Our main advancement is Starlight, a data-driven performance model that uses transfer learning to bridge the gap between fast, low-fidelity evaluation methods (such as analytical models) and slow, high-fidelity evaluation methods (such as RTL simulation). Starlight is fast: It can provide 6,500 predictions per second, allowing the evaluation of millions of configurations per hour. Starlight is accurate: It predicts the energy-delay product measured by RTL simulation with 99\% accuracy. And Starlight can be trained efficiently: It can be trained with 61\% fewer samples than DOSA's state-of-the-art data-driven performance predictor. Our second contribution is Polaris, a design-space exploration tool that uses Starlight to efficiently search the large, complex hardware/software co-design space of DLAs. In under 35 minutes, Polaris produces DLA designs that match the performance of designs that take six hours to produce with DOSA. And in under 3.3 hours, Polaris produces DLA designs that reduce energy-delay product by 2.7$\times$ over the best designs found by DOSA.

Polaris: Multi-Fidelity Design Space Exploration of Deep Learning Accelerators

TL;DR

Abstract

Polaris: Multi-Fidelity Design Space Exploration of Deep Learning Accelerators

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)