AIRCHITECT v2: Learning the Hardware Accelerator Design Space through Unified Representations
Jamin Seo, Akshat Ramachandran, Yu-Chuan Chuang, Anirudh Itagi, Tushar Krishna
TL;DR
The paper tackles design space exploration for DNN accelerators under non-uniform, non-convex landscapes. It introduces AIrchitect v2, an encoder-decoder transformer that uses contrastive learning to shape a uniform latent embedding and Unified Ordinal Vectors to unify classification and regression. Across a MAESTRO-based hardware model and a dataset of 100,000 real DNN workloads, AIrchitect v2 achieves about a 15% improvement in identifying optimal designs and about a 1.7x faster inference latency on unseen workloads. The authors release a large DSE dataset and demonstrate that the approach generalizes beyond training workloads, enabling scalable, accurate DSE for modern AI workloads.
Abstract
Design space exploration (DSE) plays a crucial role in enabling custom hardware architectures, particularly for emerging applications like AI, where optimized and specialized designs are essential. With the growing complexity of deep neural networks (DNNs) and the introduction of advanced foundational models (FMs), the design space for DNN accelerators is expanding at an exponential rate. Additionally, this space is highly non-uniform and non-convex, making it increasingly difficult to navigate and optimize. Traditional DSE techniques rely on search-based methods, which involve iterative sampling of the design space to find the optimal solution. However, this process is both time-consuming and often fails to converge to the global optima for such design spaces. Recently, AIrchitect v1, the first attempt to address the limitations of search-based techniques, transformed DSE into a constant-time classification problem using recommendation networks. In this work, we propose AIrchitect v2, a more accurate and generalizable learning-based DSE technique applicable to large-scale design spaces that overcomes the shortcomings of earlier approaches. Specifically, we devise an encoder-decoder transformer model that (a) encodes the complex design space into a uniform intermediate representation using contrastive learning and (b) leverages a novel unified representation blending the advantages of classification and regression to effectively explore the large DSE space without sacrificing accuracy. Experimental results evaluated on 10^5 real DNN workloads demonstrate that, on average, AIrchitect v2 outperforms existing techniques by 15% in identifying optimal design points. Furthermore, to demonstrate the generalizability of our method, we evaluate performance on unseen model workloads (LLMs) and attain a 1.7x improvement in inference latency on the identified hardware architecture.
