Table of Contents
Fetching ...

From Code Generation to Software Testing: AI Copilot with Context-Based RAG

Yuchen Wang, Shangxin Guo, Chee Wei Tan

TL;DR

The paper addresses bottlenecks in software testing amid rapid development by reframing bug detection and bug-free coding as interconnected goals. It proposes Copilot for Testing, a context-based Retrieval Augmented Generation system that synchronizes bug detection, fix suggestions, and automated test generation with codebase updates. Using a graph-based code context embedding and a JSON-based prompt constructor, the approach yields a 31.2% boost in bug detection accuracy, 12.6% gain in critical test coverage, and a 10.5% higher acceptance rate for code suggestions, with reduced per-bug execution time. The work demonstrates the practicality of AI-driven validation in development environments and lays a foundation for extending context-aware testing to multiple IDEs and languages.

Abstract

The rapid pace of large-scale software development places increasing demands on traditional testing methodologies, often leading to bottlenecks in efficiency, accuracy, and coverage. We propose a novel perspective on software testing by positing bug detection and coding with fewer bugs as two interconnected problems that share a common goal, which is reducing bugs with limited resources. We extend our previous work on AI-assisted programming, which supports code auto-completion and chatbot-powered Q&A, to the realm of software testing. We introduce Copilot for Testing, an automated testing system that synchronizes bug detection with codebase updates, leveraging context-based Retrieval Augmented Generation (RAG) to enhance the capabilities of large language models (LLMs). Our evaluation demonstrates a 31.2% improvement in bug detection accuracy, a 12.6% increase in critical test coverage, and a 10.5% higher user acceptance rate, highlighting the transformative potential of AI-driven technologies in modern software development practices.

From Code Generation to Software Testing: AI Copilot with Context-Based RAG

TL;DR

The paper addresses bottlenecks in software testing amid rapid development by reframing bug detection and bug-free coding as interconnected goals. It proposes Copilot for Testing, a context-based Retrieval Augmented Generation system that synchronizes bug detection, fix suggestions, and automated test generation with codebase updates. Using a graph-based code context embedding and a JSON-based prompt constructor, the approach yields a 31.2% boost in bug detection accuracy, 12.6% gain in critical test coverage, and a 10.5% higher acceptance rate for code suggestions, with reduced per-bug execution time. The work demonstrates the practicality of AI-driven validation in development environments and lays a foundation for extending context-aware testing to multiple IDEs and languages.

Abstract

The rapid pace of large-scale software development places increasing demands on traditional testing methodologies, often leading to bottlenecks in efficiency, accuracy, and coverage. We propose a novel perspective on software testing by positing bug detection and coding with fewer bugs as two interconnected problems that share a common goal, which is reducing bugs with limited resources. We extend our previous work on AI-assisted programming, which supports code auto-completion and chatbot-powered Q&A, to the realm of software testing. We introduce Copilot for Testing, an automated testing system that synchronizes bug detection with codebase updates, leveraging context-based Retrieval Augmented Generation (RAG) to enhance the capabilities of large language models (LLMs). Our evaluation demonstrates a 31.2% improvement in bug detection accuracy, a 12.6% increase in critical test coverage, and a 10.5% higher user acceptance rate, highlighting the transformative potential of AI-driven technologies in modern software development practices.

Paper Structure

This paper contains 17 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Architecture of Copilot for Testing: It proactively observes the local coding environment, retrieves code context, and generates context-aware prompts to interact with cloud-based LLMs and provide bug fix suggestions synchronized with codebase updates, aiming at more efficient and effective software testing with higher accuracy and coverage.
  • Figure 2: Overview of the context-based RAG module. The codebase is modeled as a graph, with individual nodes of code context embeddings. These embeddings are dynamically updated based on code changes initiated by the user and the outputs from LLMs. The toy example shows that when a change is made to node $s_2$, its embedding is updated, followed by updates to its neighboring nodes. The updated embeddings, which carry contextual information, are then utilized in prompt construction for the LLMs.
  • Figure 3: Sequence Diagram of Copilot for Testing. The sequence diagram illustrates the functionality of Copilot for Testing which enables real-time auto-synchronized testing through context-based RAG that leverages LLMs. Copilot for Testing receives notifications upon code updates, retrieves contextual information, and subsequently constructs prompts with the proposed RAG module. Upon receiving the suggestions, the user has the option to adopt the recommendations and directly apply the changes to the code base.