HPCAgentTester: A Multi-Agent LLM Approach for Enhanced HPC Unit Test Generation
Rabimba Karanjai, Lei Xu, Weidong Shi
TL;DR
HPC unit testing is hindered by parallelism and diverse environments. HPCAgentTester introduces a multi-agent LLM workflow with a Recipe Agent and a Test Agent that collaborate through an iterative critique loop to generate compilable and functionally correct OpenMP and MPI unit tests. The system leverages a Code Analyzer, a HPC Bug Knowledge Graph, and domain-specific fine-tuned models to create structured Test Recipes that guide targeted test generation, with automatic refinement and human-in-the-loop oversight as needed. Experimental results show improved compilation rates, correctness, and parallel construct coverage compared with standalone LLM baselines, illustrating a scalable approach to reliable parallel software in HPC contexts.
Abstract
Unit testing in High-Performance Computing (HPC) is critical but challenged by parallelism, complex algorithms, and diverse hardware. Traditional methods often fail to address non-deterministic behavior and synchronization issues in HPC applications. This paper introduces HPCAgentTester, a novel multi-agent Large Language Model (LLM) framework designed to automate and enhance unit test generation for HPC software utilizing OpenMP and MPI. HPCAgentTester employs a unique collaborative workflow where specialized LLM agents (Recipe Agent and Test Agent) iteratively generate and refine test cases through a critique loop. This architecture enables the generation of context-aware unit tests that specifically target parallel execution constructs, complex communication patterns, and hierarchical parallelism. We demonstrate HPCAgentTester's ability to produce compilable and functionally correct tests for OpenMP and MPI primitives, effectively identifying subtle bugs that are often missed by conventional techniques. Our evaluation shows that HPCAgentTester significantly improves test compilation rates and correctness compared to standalone LLMs, offering a more robust and scalable solution for ensuring the reliability of parallel software systems.
