From Code Generation to Software Testing: AI Copilot with Context-Based RAG
Yuchen Wang, Shangxin Guo, Chee Wei Tan
TL;DR
The paper addresses bottlenecks in software testing amid rapid development by reframing bug detection and bug-free coding as interconnected goals. It proposes Copilot for Testing, a context-based Retrieval Augmented Generation system that synchronizes bug detection, fix suggestions, and automated test generation with codebase updates. Using a graph-based code context embedding and a JSON-based prompt constructor, the approach yields a 31.2% boost in bug detection accuracy, 12.6% gain in critical test coverage, and a 10.5% higher acceptance rate for code suggestions, with reduced per-bug execution time. The work demonstrates the practicality of AI-driven validation in development environments and lays a foundation for extending context-aware testing to multiple IDEs and languages.
Abstract
The rapid pace of large-scale software development places increasing demands on traditional testing methodologies, often leading to bottlenecks in efficiency, accuracy, and coverage. We propose a novel perspective on software testing by positing bug detection and coding with fewer bugs as two interconnected problems that share a common goal, which is reducing bugs with limited resources. We extend our previous work on AI-assisted programming, which supports code auto-completion and chatbot-powered Q&A, to the realm of software testing. We introduce Copilot for Testing, an automated testing system that synchronizes bug detection with codebase updates, leveraging context-based Retrieval Augmented Generation (RAG) to enhance the capabilities of large language models (LLMs). Our evaluation demonstrates a 31.2% improvement in bug detection accuracy, a 12.6% increase in critical test coverage, and a 10.5% higher user acceptance rate, highlighting the transformative potential of AI-driven technologies in modern software development practices.
