Table of Contents
Fetching ...

AIRCHITECT v2: Learning the Hardware Accelerator Design Space through Unified Representations

Jamin Seo, Akshat Ramachandran, Yu-Chuan Chuang, Anirudh Itagi, Tushar Krishna

TL;DR

The paper tackles design space exploration for DNN accelerators under non-uniform, non-convex landscapes. It introduces AIrchitect v2, an encoder-decoder transformer that uses contrastive learning to shape a uniform latent embedding and Unified Ordinal Vectors to unify classification and regression. Across a MAESTRO-based hardware model and a dataset of 100,000 real DNN workloads, AIrchitect v2 achieves about a 15% improvement in identifying optimal designs and about a 1.7x faster inference latency on unseen workloads. The authors release a large DSE dataset and demonstrate that the approach generalizes beyond training workloads, enabling scalable, accurate DSE for modern AI workloads.

Abstract

Design space exploration (DSE) plays a crucial role in enabling custom hardware architectures, particularly for emerging applications like AI, where optimized and specialized designs are essential. With the growing complexity of deep neural networks (DNNs) and the introduction of advanced foundational models (FMs), the design space for DNN accelerators is expanding at an exponential rate. Additionally, this space is highly non-uniform and non-convex, making it increasingly difficult to navigate and optimize. Traditional DSE techniques rely on search-based methods, which involve iterative sampling of the design space to find the optimal solution. However, this process is both time-consuming and often fails to converge to the global optima for such design spaces. Recently, AIrchitect v1, the first attempt to address the limitations of search-based techniques, transformed DSE into a constant-time classification problem using recommendation networks. In this work, we propose AIrchitect v2, a more accurate and generalizable learning-based DSE technique applicable to large-scale design spaces that overcomes the shortcomings of earlier approaches. Specifically, we devise an encoder-decoder transformer model that (a) encodes the complex design space into a uniform intermediate representation using contrastive learning and (b) leverages a novel unified representation blending the advantages of classification and regression to effectively explore the large DSE space without sacrificing accuracy. Experimental results evaluated on 10^5 real DNN workloads demonstrate that, on average, AIrchitect v2 outperforms existing techniques by 15% in identifying optimal design points. Furthermore, to demonstrate the generalizability of our method, we evaluate performance on unseen model workloads (LLMs) and attain a 1.7x improvement in inference latency on the identified hardware architecture.

AIRCHITECT v2: Learning the Hardware Accelerator Design Space through Unified Representations

TL;DR

The paper tackles design space exploration for DNN accelerators under non-uniform, non-convex landscapes. It introduces AIrchitect v2, an encoder-decoder transformer that uses contrastive learning to shape a uniform latent embedding and Unified Ordinal Vectors to unify classification and regression. Across a MAESTRO-based hardware model and a dataset of 100,000 real DNN workloads, AIrchitect v2 achieves about a 15% improvement in identifying optimal designs and about a 1.7x faster inference latency on unseen workloads. The authors release a large DSE dataset and demonstrate that the approach generalizes beyond training workloads, enabling scalable, accurate DSE for modern AI workloads.

Abstract

Design space exploration (DSE) plays a crucial role in enabling custom hardware architectures, particularly for emerging applications like AI, where optimized and specialized designs are essential. With the growing complexity of deep neural networks (DNNs) and the introduction of advanced foundational models (FMs), the design space for DNN accelerators is expanding at an exponential rate. Additionally, this space is highly non-uniform and non-convex, making it increasingly difficult to navigate and optimize. Traditional DSE techniques rely on search-based methods, which involve iterative sampling of the design space to find the optimal solution. However, this process is both time-consuming and often fails to converge to the global optima for such design spaces. Recently, AIrchitect v1, the first attempt to address the limitations of search-based techniques, transformed DSE into a constant-time classification problem using recommendation networks. In this work, we propose AIrchitect v2, a more accurate and generalizable learning-based DSE technique applicable to large-scale design spaces that overcomes the shortcomings of earlier approaches. Specifically, we devise an encoder-decoder transformer model that (a) encodes the complex design space into a uniform intermediate representation using contrastive learning and (b) leverages a novel unified representation blending the advantages of classification and regression to effectively explore the large DSE space without sacrificing accuracy. Experimental results evaluated on 10^5 real DNN workloads demonstrate that, on average, AIrchitect v2 outperforms existing techniques by 15% in identifying optimal design points. Furthermore, to demonstrate the generalizability of our method, we evaluate performance on unseen model workloads (LLMs) and attain a 1.7x improvement in inference latency on the identified hardware architecture.
Paper Structure (20 sections, 4 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 4 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: Different design choices yield a wide range of performance, necessitating automated design space exploration. (i) Search-based methods involve iterative exploration, (ii) learning-based methods enable one-shot inference.
  • Figure 2: Overview of AIrchitect v2, highlighting (1) multi-head self-attention-based encoder and decoder structure, (2) latent embedding space improved by contrastive learning (3) UOV output representation combining classification and regression.
  • Figure 3: Prominent challenges on DSE dataset: (a) non-uniform and non-convex landscape (b) long-tailed distribution of data samples over labels. Drawn from the problem space in \ref{['sec:prob-formulation']}
  • Figure 4: Complexity of the problem space from \ref{['tab:problem-def']}, visualizing the input features ($xy$-plane, processed with PCA) and output configuration ($z$-axis, plotted into UOV buckets). This justifies the need for sophisticated model architecture.
  • Figure 5: Visualization of embedding space (a) without contrastive learning and b) with contrastive learning. Employing contrastive learning results in a uniform embedding space. Different colors represents different classes of data samples.
  • ...and 4 more figures