Large Language Models to Generate System-Level Test Programs Targeting Non-functional Properties

Denis Schwachhofer; Peter Domanski; Steffen Becker; Stefan Wagner; Matthias Sauer; Dirk Pflüger; Ilia Polian

Large Language Models to Generate System-Level Test Programs Targeting Non-functional Properties

Denis Schwachhofer, Peter Domanski, Steffen Becker, Stefan Wagner, Matthias Sauer, Dirk Pflüger, Ilia Polian

TL;DR

This work addresses the challenge of generating system-level test programs that target non-functional properties for black-box DUTs in SLT. It investigates the use of pre-trained LLMs, specifically Code Llama 13B, to automatically produce C code snippets that maximize IPC on a BOOM processor simulated in Gem5, guided by carefully crafted prompts and hyperparameter optimization via Optuna rather than model fine-tuning. The experiments demonstrate that high IPC values (up to 0.799607) and competitive pass rates (e.g., 79.96% at top-1, 99.97% at top-5) are achievable, though many generated snippets fail to compile or crash the simulator, and prompt optimization effects are inconsistent. The results indicate the feasibility of LLM-driven SLT code generation for non-functional objectives and point to future work on fine-tuning and more robust prompt strategies to improve reliability and coverage in SLT programs.

Abstract

System-Level Test (SLT) has been a part of the test flow for integrated circuits for over a decade and still gains importance. However, no systematic approaches exist for test program generation, especially targeting non-functional properties of the Device under Test (DUT). Currently, test engineers manually compose test suites from off-the-shelf software, approximating the end-user environment of the DUT. This is a challenging and tedious task that does not guarantee sufficient control over non-functional properties. This paper proposes Large Language Models (LLMs) to generate test programs. We take a first glance at how pre-trained LLMs perform in test program generation to optimize non-functional properties of the DUT. Therefore, we write a prompt to generate C code snippets that maximize the instructions per cycle of a super-scalar, out-of-order architecture in simulation. Additionally, we apply prompt and hyperparameter optimization to achieve the best possible results without further training.

Large Language Models to Generate System-Level Test Programs Targeting Non-functional Properties

TL;DR

Abstract

Large Language Models to Generate System-Level Test Programs Targeting Non-functional Properties

Authors

TL;DR

Abstract

Table of Contents

Figures (1)