Table of Contents
Fetching ...

Can Reasoning Models Reason about Hardware? An Agentic HLS Perspective

Luca Collini, Andrew Hennessee, Ramesh Karri, Siddharth Garg

TL;DR

This work probes whether reasoning-enabled LLMs can assist hardware design in High-Level Synthesis (HLS) by introducing an agentic optimization flow that automatically rewrites code, inserts pragmas, and performs full-system optimization under a latency-into-area constraint. It compares reasoning models (e.g., DeepSeek-R1, o3-mini) with non-reasoning baselines (e.g., DeepSeek-V3) on twelve benchmarks, using an ILP solver as part of the design-space search. Results show that reasoning models achieve higher success rates but incur higher costs and token usage, while area-latency outcomes are largely comparable across models; ILP formulation remains a shortcoming for all. The study also provides the first analysis of CoT reasoning tokens in hardware tasks, highlighting both promise and gaps and suggesting directions for improving reasoning-driven hardware design automation and prompt engineering.

Abstract

Recent Large Language Models (LLMs) such as OpenAI o3-mini and DeepSeek-R1 use enhanced reasoning through Chain-of-Thought (CoT). Their potential in hardware design, which relies on expert-driven iterative optimization, remains unexplored. This paper investigates whether reasoning LLMs can address challenges in High-Level Synthesis (HLS) design space exploration and optimization. During HLS, engineers manually define pragmas/directives to balance performance and resource constraints. We propose an LLM-based optimization agentic framework that automatically restructures code, inserts pragmas, and identifies optimal design points via feedback from HLs tools and access to integer-linear programming (ILP) solvers. Experiments compare reasoning models against conventional LLMs on benchmarks using success rate, efficiency, and design quality (area/latency) metrics, and provide the first-ever glimpse into the CoTs produced by a powerful open-source reasoning model like DeepSeek-R1.

Can Reasoning Models Reason about Hardware? An Agentic HLS Perspective

TL;DR

This work probes whether reasoning-enabled LLMs can assist hardware design in High-Level Synthesis (HLS) by introducing an agentic optimization flow that automatically rewrites code, inserts pragmas, and performs full-system optimization under a latency-into-area constraint. It compares reasoning models (e.g., DeepSeek-R1, o3-mini) with non-reasoning baselines (e.g., DeepSeek-V3) on twelve benchmarks, using an ILP solver as part of the design-space search. Results show that reasoning models achieve higher success rates but incur higher costs and token usage, while area-latency outcomes are largely comparable across models; ILP formulation remains a shortcoming for all. The study also provides the first analysis of CoT reasoning tokens in hardware tasks, highlighting both promise and gaps and suggesting directions for improving reasoning-driven hardware design automation and prompt engineering.

Abstract

Recent Large Language Models (LLMs) such as OpenAI o3-mini and DeepSeek-R1 use enhanced reasoning through Chain-of-Thought (CoT). Their potential in hardware design, which relies on expert-driven iterative optimization, remains unexplored. This paper investigates whether reasoning LLMs can address challenges in High-Level Synthesis (HLS) design space exploration and optimization. During HLS, engineers manually define pragmas/directives to balance performance and resource constraints. We propose an LLM-based optimization agentic framework that automatically restructures code, inserts pragmas, and identifies optimal design points via feedback from HLs tools and access to integer-linear programming (ILP) solvers. Experiments compare reasoning models against conventional LLMs on benchmarks using success rate, efficiency, and design quality (area/latency) metrics, and provide the first-ever glimpse into the CoTs produced by a powerful open-source reasoning model like DeepSeek-R1.

Paper Structure

This paper contains 19 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: HLS agentic flow with two optimization tasks.
  • Figure 2: Success rate, cost, and runtime to run comparison between DeepSeek-V3, DeepSeek-R1, and o3-mini.
  • Figure 3: Synthesis result comparison between DeepSeek-V3, DeepSeek-R1, and o3-mini. The vertical ranges represent the min/max ranges. Log scales. The blue dash lines indicate the target area used for the benchmark.
  • Figure 4: Solutions for AES sub-kernels for each model.
  • Figure 5: Average # actions across all benchmarks for each LLM.