Table of Contents
Fetching ...

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

Hejia Zhang, Zhongming Yu, Chia-Tung Ho, Haoxing Ren, Brucek Khailany, Jishen Zhao

TL;DR

The paper addresses the challenge of learning high-coverage hardware verification policies under expensive, non-differentiable simulator feedback. It proposes LLM4Cov, an offline execution-grounded framework that models verification as memoryless state transitions $s_t = (\mathcal{R}, x_t, o_t)$ with a scalar $\mathrm{Cov}(s_t) \in [0,1]$, and builds offline supervision through execution-validated data curation, coverage-guided agentic rejection fine-tuning, and verification-conditioned progressive learning. On the reality-aligned CVDP-ECov benchmark, a compact 4B model achieves $69.2\%$ coverage pass and $90.4\%$ average coverage, outperforming a $30$B teacher and approaching results of much larger models. The results demonstrate that specialized agentic supervision under execution constraints can rival large-scale scaling, and the approach remains compatible with RL or online fine-tuning when simulator budgets permit.

Abstract

Execution-aware LLM agents offer a promising paradigm for learning from tool feedback, but such feedback is often expensive and slow to obtain, making online reinforcement learning (RL) impractical. High-coverage hardware verification exemplifies this challenge due to its reliance on industrial simulators and non-differentiable execution signals. We propose LLM4Cov, an offline agent-learning framework that models verification as memoryless state transitions guided by deterministic evaluators. Building on this formulation, we introduce execution-validated data curation, policy-aware agentic data synthesis, and worst-state-prioritized sampling to enable scalable learning under execution constraints. We further curate a reality-aligned benchmark adapted from an existing verification suite through a revised evaluation protocol. Using the proposed pipeline, a compact 4B-parameter model achieves 69.2% coverage pass rate under agentic evaluation, outperforming its teacher by 5.3% and demonstrating competitive performance against models an order of magnitude larger.

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

TL;DR

The paper addresses the challenge of learning high-coverage hardware verification policies under expensive, non-differentiable simulator feedback. It proposes LLM4Cov, an offline execution-grounded framework that models verification as memoryless state transitions with a scalar , and builds offline supervision through execution-validated data curation, coverage-guided agentic rejection fine-tuning, and verification-conditioned progressive learning. On the reality-aligned CVDP-ECov benchmark, a compact 4B model achieves coverage pass and average coverage, outperforming a B teacher and approaching results of much larger models. The results demonstrate that specialized agentic supervision under execution constraints can rival large-scale scaling, and the approach remains compatible with RL or online fine-tuning when simulator budgets permit.

Abstract

Execution-aware LLM agents offer a promising paradigm for learning from tool feedback, but such feedback is often expensive and slow to obtain, making online reinforcement learning (RL) impractical. High-coverage hardware verification exemplifies this challenge due to its reliance on industrial simulators and non-differentiable execution signals. We propose LLM4Cov, an offline agent-learning framework that models verification as memoryless state transitions guided by deterministic evaluators. Building on this formulation, we introduce execution-validated data curation, policy-aware agentic data synthesis, and worst-state-prioritized sampling to enable scalable learning under execution constraints. We further curate a reality-aligned benchmark adapted from an existing verification suite through a revised evaluation protocol. Using the proposed pipeline, a compact 4B-parameter model achieves 69.2% coverage pass rate under agentic evaluation, outperforming its teacher by 5.3% and demonstrating competitive performance against models an order of magnitude larger.
Paper Structure (21 sections, 12 equations, 8 figures, 4 tables)

This paper contains 21 sections, 12 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Execution-aware verification loop and its dominant cost in modern hardware design. hegde2025llmsfoster2024wgicasicfoster2017dvcon
  • Figure 2: Coverage pass rates of existing LLMs. Results are measured in agentic setting on our benchmark (Section \ref{['sec:benchmark_and_metrics']}).
  • Figure 3: Main components of LLM4Cov. (a) The framework converts simulator coverage feedback into stable offline supervision through staged, execution-aware training aligned with the evolving student distribution. (b) Coverage-Guided Agentic Rejection Fine-tuning retains low-coverage drafts and their most coverage-improving revisions, concentrating supervision on recovery behaviors. (c) Verification-Conditioned Progressive Learning generates and trains on staged synthetic trajectories conditioned on the current student, yielding progressively stronger agentic performance and more stable final coverage.
  • Figure 4: Agentic trace taxonomy under intermediate-state distribution drift. Full-teacher traces are omitted in Stage 2 since the relative gap between imitation-style and full-teacher supervision arises from state-distribution mismatch, and is not expected to vary qualitatively with the teacher–student performance gap.
  • Figure 5: Comparison between intermediate state selection strategies in Stage 1. Evaluated under the agentic setting.
  • ...and 3 more figures