Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing

Rabimba Karanjai; Aftab Hussain; Md Rafiqul Islam Rabin; Lei Xu; Weidong Shi; Mohammad Amin Alipour

Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing

Rabimba Karanjai, Aftab Hussain, Md Rafiqul Islam Rabin, Lei Xu, Weidong Shi, Mohammad Amin Alipour

TL;DR

The paper investigates using two OpenAI LLMs (Davinci and ChatGPT) to automatically generate unit tests for C++ HPC code that uses OpenMP and MPI. It constructs a benchmark from seven HPC projects and evaluates three prompt strategies (out-of-the-box, template-guided, and contextual) across 216 generated tests (648 test cases) to assess compilation, coverage, correctness, and test smells. Results show that context-guided generation improves compilability and coverage substantially, but fully correct tests and robust hierarchical parallelism checks remain challenging, with issues like missing pragmas and repetitive assertions. The work demonstrates the feasibility of LLM-assisted HPC unit test generation, highlights practical post-processing needs, and provides guidelines for designing prompts and contexts to better capture parallelism and library semantics in future efforts.

Abstract

Unit testing is crucial in software engineering for ensuring quality. However, it's not widely used in parallel and high-performance computing software, particularly scientific applications, due to their smaller, diverse user base and complex logic. These factors make unit testing challenging and expensive, as it requires specialized knowledge and existing automated tools are often ineffective. To address this, we propose an automated method for generating unit tests for such software, considering their unique features like complex logic and parallel processing. Recently, large language models (LLMs) have shown promise in coding and testing. We explored the capabilities of Davinci (text-davinci-002) and ChatGPT (gpt-3.5-turbo) in creating unit tests for C++ parallel programs. Our results show that LLMs can generate mostly correct and comprehensive unit tests, although they have some limitations, such as repetitive assertions and blank test cases.

Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing

TL;DR

Abstract

Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing

Authors

TL;DR

Abstract

Table of Contents

Figures (1)