Table of Contents
Fetching ...

Echoes of AI: Investigating the Downstream Effects of AI Assistants on Software Maintainability

Markus Borg, Dave Hewett, Nadim Hagatulah, Noric Couderc, Emma Söderberg, Donald Graham, Uttam Kini, Dave Farley

TL;DR

The paper investigates whether AI-assisted co-development affects software maintainability, focusing on downstream evolution by having new developers evolve otherwise Phase 1-created code without AI. Using a preregistered two-phase design with 151 participants, it measures completion time, CodeHealth, test coverage, and perceived productivity through frequentist and Bayesian analyses, plus qualitative data. Phase 2 shows no robust evidence that AI-assisted code yields faster evolution or higher code quality, though habitual AI users show small, uncertain CodeHealth gains and Phase 1 results reveal notable speedups. The study highlights risks such as potential code volume growth and cognitive debt, and calls for future work on long-term, agentic-AI impacts and knowledge-management strategies in AI-enabled software teams.

Abstract

[Context] AI assistants, like GitHub Copilot and Cursor, are transforming software engineering. While several studies highlight productivity improvements, their impact on maintainability requires further investigation. [Objective] This study investigates whether co-development with AI assistants affects software maintainability, specifically how easily other developers can evolve the resulting source code. [Method] We conducted a two-phase controlled experiment involving 151 participants, 95% of whom were professional developers. In Phase 1, participants added a new feature to a Java web application, with or without AI assistance. In Phase 2, a randomized controlled trial, new participants evolved these solutions without AI assistance. [Results] Phase 2 revealed no significant differences in subsequent evolution with respect to completion time or code quality. Bayesian analysis suggests that any speed or quality improvements from AI use were at most small and highly uncertain. Observational results from Phase 1 corroborate prior research: using an AI assistant yielded a 30.7% median reduction in completion time, and habitual AI users showed an estimated 55.9% speedup. [Conclusions] Overall, we did not detect systematic maintainability advantages or disadvantages when other developers evolved code co-developed with AI assistants. Within the scope of our tasks and measures, we observed no consistent warning signs of degraded code-level maintainability. Future work should examine risks such as code bloat from excessive code generation and cognitive debt as developers offload more mental effort to assistants.

Echoes of AI: Investigating the Downstream Effects of AI Assistants on Software Maintainability

TL;DR

The paper investigates whether AI-assisted co-development affects software maintainability, focusing on downstream evolution by having new developers evolve otherwise Phase 1-created code without AI. Using a preregistered two-phase design with 151 participants, it measures completion time, CodeHealth, test coverage, and perceived productivity through frequentist and Bayesian analyses, plus qualitative data. Phase 2 shows no robust evidence that AI-assisted code yields faster evolution or higher code quality, though habitual AI users show small, uncertain CodeHealth gains and Phase 1 results reveal notable speedups. The study highlights risks such as potential code volume growth and cognitive debt, and calls for future work on long-term, agentic-AI impacts and knowledge-management strategies in AI-enabled software teams.

Abstract

[Context] AI assistants, like GitHub Copilot and Cursor, are transforming software engineering. While several studies highlight productivity improvements, their impact on maintainability requires further investigation. [Objective] This study investigates whether co-development with AI assistants affects software maintainability, specifically how easily other developers can evolve the resulting source code. [Method] We conducted a two-phase controlled experiment involving 151 participants, 95% of whom were professional developers. In Phase 1, participants added a new feature to a Java web application, with or without AI assistance. In Phase 2, a randomized controlled trial, new participants evolved these solutions without AI assistance. [Results] Phase 2 revealed no significant differences in subsequent evolution with respect to completion time or code quality. Bayesian analysis suggests that any speed or quality improvements from AI use were at most small and highly uncertain. Observational results from Phase 1 corroborate prior research: using an AI assistant yielded a 30.7% median reduction in completion time, and habitual AI users showed an estimated 55.9% speedup. [Conclusions] Overall, we did not detect systematic maintainability advantages or disadvantages when other developers evolved code co-developed with AI assistants. Within the scope of our tasks and measures, we observed no consistent warning signs of degraded code-level maintainability. Future work should examine risks such as code bloat from excessive code generation and cognitive debt as developers offload more mental effort to assistants.

Paper Structure

This paper contains 77 sections, 6 equations, 24 figures, 12 tables.

Figures (24)

  • Figure 1: Conceptual relationship between maintainability (artifact-centric) and productivity (developer-centric). ISO 25010 frames maintainability, SPACE frames productivity, and artifact X links the two through the maintenance activity.
  • Figure 2: Goal of the study, outlined using the GQM structure.
  • Figure 3: Overview of the study. The part in the yellow box, to which about 50% of the participants were assigned, constituted the RCT.
  • Figure 4: DAGitty causal graph. $Dev1$ and $Dev2$ represent the full complexity of the human participants in Phases 1 and 2, respectively. $Code1$ and $Code2$ are the participants' solutions after Phases 1 and 2, respectively. $AI\_use$ is the independent variable. The other variables are explained in Table \ref{['tab:covariates']}.
  • Figure 5: A diagram explaining an ordered logistic regression model. If we observe the two predictors $x$ and $y$, as well as the observed responses (observed). The model infers both a latent score (latent) for each point (as a linear combination of $x$ and $y$), and cutoffs between the different response levels. In other words, the model finds hyperplanes separating each response level with the next, where all hyperplanes are parallel.
  • ...and 19 more figures