Zero-Shot RTL Code Generation with Attention Sink Augmented Large Language Models
Selim Sandal, Ismail Akturk
TL;DR
This work addresses zero-shot RTL code generation from high-level hardware specifications using large language models augmented with an attention sink mechanism. By evaluating multiple instruction-tuned LLMs on a Neural Processing Unit (NPU) RTL case study, the authors compare dense, window, and attention-sink attention, demonstrating that attention sink markedly improves long-sequence code generation. The study reports that with attention sink, $4293$ of $4312$ RTL tokens are correct ($99.56\%$), requiring only $16$ fixes, versus hundreds to thousands of fixes for the other attention schemes, underscoring the method's practicality for automated design generation without RTL-focused fine-tuning. The results suggest a viable path for rapid, large-scale architectural exploration and design automation in hardware using LLMs, with attention mechanisms playing a pivotal role in maintaining specification integrity over extended outputs.
Abstract
The design and optimization of hardware have traditionally been resource-intensive, demanding considerable expertise and dependence on established design automation tools. This paper discusses the possibility of exploiting large language models to streamline the code generation process in hardware design. In contrast to earlier studies, this paper aims to use large language models that accepts high-level design specifications through a single prompt to generate corresponding Register-Transfer Level (RTL) code. The ability to use large language models on RTL code generation not only expedites design iteration cycles but also facilitates the exploration of design spaces that have computational challenges for conventional techniques. Through our evaluation, we demonstrate the shortcoming of existing attention mechanisms, and present the abilities of language models to produce functional, optimized, and industry-standard compliant RTL code when a novel attention mechanism is used. These findings underscore the expanding role of large language models in shaping the future landscape of architectural exploration and automation in hardware design.
