Table of Contents
Fetching ...

Generative AI for Test Driven Development: Preliminary Results

Moritz Mock, Jorge Melegati, Barbara Russo

TL;DR

Test-driven development promises high-quality code but has limited adoption due to extra coding effort and required expertise. The authors explore generative AI as an automation aid for TDD by defining collaborative and fully-automated interaction patterns, implementing an incremental, prompt-driven workflow, and evaluating with five developers against a non-AI baseline. The study finds GenAI can support TDD under supervision, but outputs may be buggy or misaligned with tests, especially for non-experts, highlighting the need for quality checks and human-in-the-loop control. The work provides actionable guidance on prompt design, developer roles, and integration of GenAI into iterative TDD workflows, paving the way for broader adoption and future research.

Abstract

Test Driven Development (TDD) is one of the major practices of Extreme Programming for which incremental testing and refactoring trigger the code development. TDD has limited adoption in the industry, as it requires more code to be developed and experienced developers. Generative AI (GenAI) may reduce the extra effort imposed by TDD. In this work, we introduce an approach to automatize TDD by embracing GenAI either in a collaborative interaction pattern in which developers create tests and supervise the AI generation during each iteration or a fully-automated pattern in which developers only supervise the AI generation at the end of the iterations. We run an exploratory experiment with ChatGPT in which the interaction patterns are compared with the non-AI TDD regarding test and code quality and development speed. Overall, we found that, for our experiment and settings, GenAI can be efficiently used in TDD, but it requires supervision of the quality of the produced code. In some cases, it can even mislead non-expert developers and propose solutions just for the sake of the query.

Generative AI for Test Driven Development: Preliminary Results

TL;DR

Test-driven development promises high-quality code but has limited adoption due to extra coding effort and required expertise. The authors explore generative AI as an automation aid for TDD by defining collaborative and fully-automated interaction patterns, implementing an incremental, prompt-driven workflow, and evaluating with five developers against a non-AI baseline. The study finds GenAI can support TDD under supervision, but outputs may be buggy or misaligned with tests, especially for non-experts, highlighting the need for quality checks and human-in-the-loop control. The work provides actionable guidance on prompt design, developer roles, and integration of GenAI into iterative TDD workflows, paving the way for broader adoption and future research.

Abstract

Test Driven Development (TDD) is one of the major practices of Extreme Programming for which incremental testing and refactoring trigger the code development. TDD has limited adoption in the industry, as it requires more code to be developed and experienced developers. Generative AI (GenAI) may reduce the extra effort imposed by TDD. In this work, we introduce an approach to automatize TDD by embracing GenAI either in a collaborative interaction pattern in which developers create tests and supervise the AI generation during each iteration or a fully-automated pattern in which developers only supervise the AI generation at the end of the iterations. We run an exploratory experiment with ChatGPT in which the interaction patterns are compared with the non-AI TDD regarding test and code quality and development speed. Overall, we found that, for our experiment and settings, GenAI can be efficiently used in TDD, but it requires supervision of the quality of the produced code. In some cases, it can even mislead non-expert developers and propose solutions just for the sake of the query.
Paper Structure (5 sections, 1 figure, 3 tables)

This paper contains 5 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Fully-automated pattern