Speculative Decoding for Verilog: Speed and Quality, All in One
Changran Xu, Yi Liu, Yunhao Zhou, Shan Huang, Ningyi Xu, Qiang Xu
TL;DR
This work tackles Verilog code generation with large language models, addressing tokenization and syntactic challenges that arise from Verilog's rigid grammar and limited training data. It introduces syntax-enriched speculative decoding, combining multiple decoding heads with tokens aligned to syntactically significant fragments derived from ASTs, to accelerate inference while preserving structural correctness. The approach leverages 10 decoding heads and a syntax-aware labeling scheme, evaluated on CodeLlama and CodeT5p baselines, and yields substantial improvements in both speed (up to 5.05x) and functional/syntactic accuracy (up to 17.19% on RTLLM and significant gains over Medusa and NTP). Practically, this method demonstrates that aligning decoding with language syntax can make Verilog generation more reliable and efficient for hardware design tasks, expanding the applicability of LLMs to specialized programming languages.
Abstract
The rapid advancement of large language models (LLMs) has revolutionized code generation tasks across various programming languages. However, the unique characteristics of programming languages, particularly those like Verilog with specific syntax and lower representation in training datasets, pose significant challenges for conventional tokenization and decoding approaches. In this paper, we introduce a novel application of speculative decoding for Verilog code generation, showing that it can improve both inference speed and output quality, effectively achieving speed and quality all in one. Unlike standard LLM tokenization schemes, which often fragment meaningful code structures, our approach aligns decoding stops with syntactically significant tokens, making it easier for models to learn the token distribution. This refinement addresses inherent tokenization issues and enhances the model's ability to capture Verilog's logical constructs more effectively. Our experimental results show that our method achieves up to a 5.05x speedup in Verilog code generation and increases pass@10 functional accuracy on RTLLM by up to 17.19% compared to conventional training strategies. These findings highlight speculative decoding as a promising approach to bridge the quality gap in code generation for specialized programming languages.
