Table of Contents
Fetching ...

Does Co-Development with AI Assistants Lead to More Maintainable Code? A Registered Report

Markus Borg, Dave Hewett, Donald Graham, Noric Couderc, Emma Söderberg, Luke Church, Dave Farley

TL;DR

This registered report addresses whether co-developing with AI assistants affects software maintainability by evaluating how AI involvement in Phase 1 influences the ease of subsequent code evolution in Phase 2. It employs a two‑phase design: Phase 1 introduces a feature with or without AI help, followed by Phase 2 an RCT where new developers evolve the Phase 1 solution without AI, measuring Completion Time, Code Health, Test Coverage, and Perceived Productivity. The study combines Bayesian analysis with frequentist hypothesis testing and uses a DAG-based framework to account for confounders, aiming to produce actionable evidence on maintainability implications for AI-assisted software development. The replication package and realistic, industry-aligned tasks enhance external validity and provide practical guidance for practitioners considering AI assistants in long‑term maintenance tasks.

Abstract

[Background/Context] AI assistants like GitHub Copilot are transforming software engineering; several studies have highlighted productivity improvements. However, their impact on code quality, particularly in terms of maintainability, requires further investigation. [Objective/Aim] This study aims to examine the influence of AI assistants on software maintainability, specifically assessing how these tools affect the ability of developers to evolve code. [Method] We will conduct a two-phased controlled experiment involving professional developers. In Phase 1, developers will add a new feature to a Java project, with or without the aid of an AI assistant. Phase 2, a randomized controlled trial, will involve a different set of developers evolving random Phase 1 projects - working without AI assistants. We will employ Bayesian analysis to evaluate differences in completion time, perceived productivity, code quality, and test coverage.

Does Co-Development with AI Assistants Lead to More Maintainable Code? A Registered Report

TL;DR

This registered report addresses whether co-developing with AI assistants affects software maintainability by evaluating how AI involvement in Phase 1 influences the ease of subsequent code evolution in Phase 2. It employs a two‑phase design: Phase 1 introduces a feature with or without AI help, followed by Phase 2 an RCT where new developers evolve the Phase 1 solution without AI, measuring Completion Time, Code Health, Test Coverage, and Perceived Productivity. The study combines Bayesian analysis with frequentist hypothesis testing and uses a DAG-based framework to account for confounders, aiming to produce actionable evidence on maintainability implications for AI-assisted software development. The replication package and realistic, industry-aligned tasks enhance external validity and provide practical guidance for practitioners considering AI assistants in long‑term maintenance tasks.

Abstract

[Background/Context] AI assistants like GitHub Copilot are transforming software engineering; several studies have highlighted productivity improvements. However, their impact on code quality, particularly in terms of maintainability, requires further investigation. [Objective/Aim] This study aims to examine the influence of AI assistants on software maintainability, specifically assessing how these tools affect the ability of developers to evolve code. [Method] We will conduct a two-phased controlled experiment involving professional developers. In Phase 1, developers will add a new feature to a Java project, with or without the aid of an AI assistant. Phase 2, a randomized controlled trial, will involve a different set of developers evolving random Phase 1 projects - working without AI assistants. We will employ Bayesian analysis to evaluate differences in completion time, perceived productivity, code quality, and test coverage.
Paper Structure (9 sections, 3 figures, 3 tables)

This paper contains 9 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Goal of the study, outlined using the GQM structure.
  • Figure 2: Overview of the study. The part in the yellow box, to which about 50% of our participants will be assigned, constitutes the RCT.
  • Figure 3: DAGitty causal graph. Dev1 and Dev2 represent the full complexity of the human participants in Phases 1 and 2, respectively. Code1 and Code2 are the participants' solutions after Phases 1 and 2, respectively. $AI\_use$ is the independent variable. The other variables are explained in Table \ref{['tab:confounders']}.