DiffuSE: Cross-Layer Design Space Exploration of DNN Accelerator via Diffusion-Driven Optimization
Yi Ren, Chenhao Xue, Jiaxing Zhang, Chen Zhang, Qiang Xu, Yibo Lin, Lining Zhang, Guangyu Sun
TL;DR
DiffuSE tackles the challenge of cross-layer DNN accelerator design by turning QoR-to-configuration exploration into a diffusion-driven inverse mapping. It uses a diffusion model, guided by Pareto-aware conditioning, to generate hardware–EDA parameter configurations that lie within the training-data distribution, enabling sample-efficient navigation of the multi-objective QoR space. The framework combines a diffusion generator, a gradient-guided predictor, and a Pareto-frontier-based target selection to rapidly identify high-quality configurations, achieving substantial improvements in PPA and hypervolume compared to MOBO on a 7nm flow. This approach enables scalable, efficient optimization across complex design spaces and has practical impact for accelerating hardware-software co-design in DNN accelerators.
Abstract
The proliferation of deep learning accelerators calls for efficient and cost-effective hardware design solutions, where parameterized modular hardware generator and electronic design automation (EDA) tools play crucial roles in improving productivity and final Quality-of-Results (QoR). To strike a good balance across multiple QoR of interest (e.g., performance, power, and area), the designers need to navigate a vast design space, encompassing tunable parameters for both hardware generator and EDA synthesis tools. However, the significant time for EDA tool invocations and complex interplay among numerous design parameters make this task extremely challenging, even for experienced designers. To address these challenges, we introduce DiffuSE, a diffusion-driven design space exploration framework for cross-layer optimization of DNN accelerators. DiffuSE leverages conditional diffusion models to capture the inverse, one-to-many mapping from QoR objectives to parameter combinations, allowing for targeted exploration within promising regions of the design space. By carefully selecting the conditioning QoR values, the framework facilitates an effective trade-off among multiple QoR metrics in a sample-efficient manner. Experimental results under 7nm technology demonstrate the superiority of the proposed framework compared to previous arts.
