Can Reasoning Models Reason about Hardware? An Agentic HLS Perspective
Luca Collini, Andrew Hennessee, Ramesh Karri, Siddharth Garg
TL;DR
This work probes whether reasoning-enabled LLMs can assist hardware design in High-Level Synthesis (HLS) by introducing an agentic optimization flow that automatically rewrites code, inserts pragmas, and performs full-system optimization under a latency-into-area constraint. It compares reasoning models (e.g., DeepSeek-R1, o3-mini) with non-reasoning baselines (e.g., DeepSeek-V3) on twelve benchmarks, using an ILP solver as part of the design-space search. Results show that reasoning models achieve higher success rates but incur higher costs and token usage, while area-latency outcomes are largely comparable across models; ILP formulation remains a shortcoming for all. The study also provides the first analysis of CoT reasoning tokens in hardware tasks, highlighting both promise and gaps and suggesting directions for improving reasoning-driven hardware design automation and prompt engineering.
Abstract
Recent Large Language Models (LLMs) such as OpenAI o3-mini and DeepSeek-R1 use enhanced reasoning through Chain-of-Thought (CoT). Their potential in hardware design, which relies on expert-driven iterative optimization, remains unexplored. This paper investigates whether reasoning LLMs can address challenges in High-Level Synthesis (HLS) design space exploration and optimization. During HLS, engineers manually define pragmas/directives to balance performance and resource constraints. We propose an LLM-based optimization agentic framework that automatically restructures code, inserts pragmas, and identifies optimal design points via feedback from HLs tools and access to integer-linear programming (ILP) solvers. Experiments compare reasoning models against conventional LLMs on benchmarks using success rate, efficiency, and design quality (area/latency) metrics, and provide the first-ever glimpse into the CoTs produced by a powerful open-source reasoning model like DeepSeek-R1.
