Table of Contents
Fetching ...

UnitTenX: Generating Tests for Legacy Packages with AI Agents Powered by Formal Verification

Yiannis Charalambous, Claudionor N. Coelho, Luis Lamb, Lucas C. Cordeiro

TL;DR

The paper addresses the challenge of testing and maintaining legacy software by introducing UnitTenX, an AI-agent framework that integrates formal verification with LLM-driven test generation to automatically produce unit tests for undocumented C interfaces. By combining ESBMC-based symbolic analysis with transformer-based code generation, UnitTenX identifies sensitization conditions and edge cases to maximize code coverage and reveal crash states while generating regression-oriented tests. Experimental evaluation on a real-world DNS server demonstrates substantial coverage improvements, robust error handling, and a productive reflection loop that incrementally enhances test quality. The approach has practical impact for software maintenance and modernization, offering a path to higher reliability and clearer documentation for legacy codebases, though broader validation across diverse systems is needed.

Abstract

This paper introduces UnitTenX, a state-of-the-art open-source AI multi-agent system designed to generate unit tests for legacy code, enhancing test coverage and critical value testing. UnitTenX leverages a combination of AI agents, formal methods, and Large Language Models (LLMs) to automate test generation, addressing the challenges posed by complex and legacy codebases. Despite the limitations of LLMs in bug detection, UnitTenX offers a robust framework for improving software reliability and maintainability. Our results demonstrate the effectiveness of this approach in generating high-quality tests and identifying potential issues. Additionally, our approach enhances the readability and documentation of legacy code.

UnitTenX: Generating Tests for Legacy Packages with AI Agents Powered by Formal Verification

TL;DR

The paper addresses the challenge of testing and maintaining legacy software by introducing UnitTenX, an AI-agent framework that integrates formal verification with LLM-driven test generation to automatically produce unit tests for undocumented C interfaces. By combining ESBMC-based symbolic analysis with transformer-based code generation, UnitTenX identifies sensitization conditions and edge cases to maximize code coverage and reveal crash states while generating regression-oriented tests. Experimental evaluation on a real-world DNS server demonstrates substantial coverage improvements, robust error handling, and a productive reflection loop that incrementally enhances test quality. The approach has practical impact for software maintenance and modernization, offering a path to higher reliability and clearer documentation for legacy codebases, though broader validation across diverse systems is needed.

Abstract

This paper introduces UnitTenX, a state-of-the-art open-source AI multi-agent system designed to generate unit tests for legacy code, enhancing test coverage and critical value testing. UnitTenX leverages a combination of AI agents, formal methods, and Large Language Models (LLMs) to automate test generation, addressing the challenges posed by complex and legacy codebases. Despite the limitations of LLMs in bug detection, UnitTenX offers a robust framework for improving software reliability and maintainability. Our results demonstrate the effectiveness of this approach in generating high-quality tests and identifying potential issues. Additionally, our approach enhances the readability and documentation of legacy code.

Paper Structure

This paper contains 16 sections, 1 equation, 6 figures.

Figures (6)

  • Figure 1: Shows each component of UnitTenX along with the flow of data. The orange nodes denote steps that use LLM to process data.
  • Figure 2: C Package Problems
  • Figure 3: Tests are Regression Units
  • Figure 4: Regression Units Can Detect Changes
  • Figure 5: RQ1: UnitTenX Initial test coverage rating against final.
  • ...and 1 more figures