Table of Contents
Fetching ...

On the Viability of using LLMs for SW/HW Co-Design: An Example in Designing CiM DNN Accelerators

Zheyu Yan, Yifan Qin, Xiaobo Sharon Hu, Yiyu Shi

TL;DR

This paper investigates using Large Language Models (LLMs) as optimizers for software-hardware co-design of Compute-in-Memory (CiM) DNN accelerators. It introduces LCDA, a framework that couples a GPT-4-based design optimizer with a design generator, performance evaluator, and cost evaluator to accelerate co-design over the NACIM baseline. Experiments on CIFAR-10 demonstrate up to $25\times$ faster design discovery with comparable accuracy and energy-latency trade-offs, though some CiM-specific behaviors require task-specific tuning. The work highlights the potential of LLM-informed co-design and outlines directions for explainability, open-source LLMs, and tooling to further advance this approach.

Abstract

Deep Neural Networks (DNNs) have demonstrated impressive performance across a wide range of tasks. However, deploying DNNs on edge devices poses significant challenges due to stringent power and computational budgets. An effective solution to this issue is software-hardware (SW-HW) co-design, which allows for the tailored creation of DNN models and hardware architectures that optimally utilize available resources. However, SW-HW co-design traditionally suffers from slow optimization speeds because their optimizers do not make use of heuristic knowledge, also known as the ``cold start'' problem. In this study, we present a novel approach that leverages Large Language Models (LLMs) to address this issue. By utilizing the abundant knowledge of pre-trained LLMs in the co-design optimization process, we effectively bypass the cold start problem, substantially accelerating the design process. The proposed method achieves a significant speedup of 25x. This advancement paves the way for the rapid and efficient deployment of DNNs on edge devices.

On the Viability of using LLMs for SW/HW Co-Design: An Example in Designing CiM DNN Accelerators

TL;DR

This paper investigates using Large Language Models (LLMs) as optimizers for software-hardware co-design of Compute-in-Memory (CiM) DNN accelerators. It introduces LCDA, a framework that couples a GPT-4-based design optimizer with a design generator, performance evaluator, and cost evaluator to accelerate co-design over the NACIM baseline. Experiments on CIFAR-10 demonstrate up to faster design discovery with comparable accuracy and energy-latency trade-offs, though some CiM-specific behaviors require task-specific tuning. The work highlights the potential of LLM-informed co-design and outlines directions for explainability, open-source LLMs, and tooling to further advance this approach.

Abstract

Deep Neural Networks (DNNs) have demonstrated impressive performance across a wide range of tasks. However, deploying DNNs on edge devices poses significant challenges due to stringent power and computational budgets. An effective solution to this issue is software-hardware (SW-HW) co-design, which allows for the tailored creation of DNN models and hardware architectures that optimally utilize available resources. However, SW-HW co-design traditionally suffers from slow optimization speeds because their optimizers do not make use of heuristic knowledge, also known as the ``cold start'' problem. In this study, we present a novel approach that leverages Large Language Models (LLMs) to address this issue. By utilizing the abundant knowledge of pre-trained LLMs in the co-design optimization process, we effectively bypass the cold start problem, substantially accelerating the design process. The proposed method achieves a significant speedup of 25x. This advancement paves the way for the rapid and efficient deployment of DNNs on edge devices.
Paper Structure (15 sections, 2 equations, 6 figures, 2 algorithms)

This paper contains 15 sections, 2 equations, 6 figures, 2 algorithms.

Figures (6)

  • Figure 1: CiM DNN accelerator.
  • Figure 2: Crossbar array.
  • Figure 4: Accuracy-energy trade-offs of different design candidates provided by LCDA (blue square) and NACIM (orange dot). The Y-axis represents accuracy and the X-axis represents energy consumption in pJ.
  • Figure 5: Rewards of different design candidates provided by LCDA (blue line) and NACIM (orange line). Figure (a) shows the results for the first 20 episodes and (b) shows the results of the $21^{st}$ to $500^{th}$ episode. Note that we only perform 20 episodes of search in LCDA, so we use the maximum reward of the first 20 episodes of LCDA to project its results in (b).
  • Figure 6: Accuracy-latency trade-offs of different design candidates provided by LCDA (blue square) and NACIM (orange dot). The Y-axis represents accuracy and the X-axis represents latency in ns.
  • ...and 1 more figures