CUBETESTERAI: Automated JUnit Test Generation using the LLaMA Model
Daniele Gorla, Shivam Kumar, Pietro Nicolaus Roselli Lorenzini, Alireza Alipourfaz
TL;DR
The paper tackles automated generation of JUnit tests for Java/Spring Boot applications by leveraging the LLaMA large language model within CubeTesterAI. It presents an end-to-end system that combines a user-friendly web interface, a GitLab/Docker CI/CD pipeline, and RunPod-backed LLaMA execution to generate, refine, and validate tests, including support for private-method testing via indirect approaches. Experimental results show that LLaMA-70B-Instruct-Gradient-1048k delivers strong code coverage and favorable comparisons against ChatGPT and other state-of-the-art tools, with a transparent cost model of around $4€$ per $100$ lines. The work contributes a privacy-aware, scalable workflow and outlines future directions such as efficiency optimizations, collaborative features, adaptive learning, richer error analysis, test oracles, and multilingual expansion to broaden applicability.
Abstract
This paper presents an approach to automating JUnit test generation for Java applications using the Spring Boot framework, leveraging the LLaMA (Large Language Model Architecture) model to enhance the efficiency and accuracy of the testing process. The resulting tool, called CUBETESTERAI, includes a user-friendly web interface and the integration of a CI/CD pipeline using GitLab and Docker. These components streamline the automated test generation process, allowing developers to generate JUnit tests directly from their code snippets with minimal manual intervention. The final implementation executes the LLaMA models through RunPod, an online GPU service, which also enhances the privacy of our tool. Using the advanced natural language processing capabilities of the LLaMA model, CUBETESTERAI is able to generate test cases that provide high code coverage and accurate validation of software functionalities in Java-based Spring Boot applications. Furthermore, it efficiently manages resource-intensive operations and refines the generated tests to address common issues like missing imports and handling of private methods. By comparing CUBETESTERAI with some state-of-the-art tools, we show that our proposal consistently demonstrates competitive and, in many cases, better performance in terms of code coverage in different real-life Java programs.
