Table of Contents
Fetching ...

Does AI-Assisted Coding Deliver? A Difference-in-Differences Study of Cursor's Impact on Software Projects

Hao He, Courtney Miller, Shyam Agarwal, Christian Kästner, Bogdan Vasilescu

TL;DR

This study asks whether an AI-assisted coding agent (Cursor) delivers sustained project-level productivity and code quality benefits. It uses a staggered adoption difference-in-differences design with propensity-score matching to compare 807 Cursor-adopting GitHub repositories to 1,380 matched controls, analyzing velocity (commits, lines added) and quality (static analysis warnings, code complexity, duplicate lines), followed by panel GMM to explore causality between velocity and quality. The authors find substantial but transient velocity gains, coupled with persistent increases in static analysis warnings and code complexity, and show via causal-path analysis that accumulated technical debt dampens future velocity. These results highlight a velocity–quality trade-off in AI-driven development and call for deliberate quality-assurance integration and design improvements in future AI coding tools to sustain gains.

Abstract

Large language models (LLMs) have demonstrated the promise to revolutionize the field of software engineering. Among other things, LLM agents are rapidly gaining momentum in their application to software development, with practitioners claiming a multifold productivity increase after adoption. Yet, empirical evidence is lacking around these claims. In this paper, we estimate the causal effect of adopting a widely popular LLM agent assistant, namely Cursor, on development velocity and software quality. The estimation is enabled by a state-of-the-art difference-in-differences design comparing Cursor-adopting GitHub projects with a matched control group of similar GitHub projects that do not use Cursor. We find that the adoption of Cursor leads to a significant, large, but transient increase in project-level development velocity, along with a significant and persistent increase in static analysis warnings and code complexity. Further panel generalized method of moments estimation reveals that the increase in static analysis warnings and code complexity acts as a major factor causing long-term velocity slowdown. Our study carries implications for software engineering practitioners, LLM agent assistant designers, and researchers.

Does AI-Assisted Coding Deliver? A Difference-in-Differences Study of Cursor's Impact on Software Projects

TL;DR

This study asks whether an AI-assisted coding agent (Cursor) delivers sustained project-level productivity and code quality benefits. It uses a staggered adoption difference-in-differences design with propensity-score matching to compare 807 Cursor-adopting GitHub repositories to 1,380 matched controls, analyzing velocity (commits, lines added) and quality (static analysis warnings, code complexity, duplicate lines), followed by panel GMM to explore causality between velocity and quality. The authors find substantial but transient velocity gains, coupled with persistent increases in static analysis warnings and code complexity, and show via causal-path analysis that accumulated technical debt dampens future velocity. These results highlight a velocity–quality trade-off in AI-driven development and call for deliberate quality-assurance integration and design improvements in future AI coding tools to sustain gains.

Abstract

Large language models (LLMs) have demonstrated the promise to revolutionize the field of software engineering. Among other things, LLM agents are rapidly gaining momentum in their application to software development, with practitioners claiming a multifold productivity increase after adoption. Yet, empirical evidence is lacking around these claims. In this paper, we estimate the causal effect of adopting a widely popular LLM agent assistant, namely Cursor, on development velocity and software quality. The estimation is enabled by a state-of-the-art difference-in-differences design comparing Cursor-adopting GitHub projects with a matched control group of similar GitHub projects that do not use Cursor. We find that the adoption of Cursor leads to a significant, large, but transient increase in project-level development velocity, along with a significant and persistent increase in static analysis warnings and code complexity. Further panel generalized method of moments estimation reveals that the increase in static analysis warnings and code complexity acts as a major factor causing long-term velocity slowdown. Our study carries implications for software engineering practitioners, LLM agent assistant designers, and researchers.

Paper Structure

This paper contains 29 sections, 7 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: The Cursor adoption time of the 807 repositories in our study, which all have $\ge$10 stars and Cursor configuration files at the time of data collection (April 2025).
  • Figure 2: Row 1: The estimated dynamic treatment effects -6 to +6 months before or after adoption. All outcome variables are log-transformed same as the estimated average treatment effects in Table \ref{['tab:average-te']}. Note that -1 month was deliberately removed from the models to serve as a counterfactual baseline (for TWFE; regression results in replication package) or avoid potential anticipation effects (for Borusyak---Table \ref{['tab:average-te']} and Callaway)---replication package. Row 2: Robustness check (Section \ref{['sec:limitations']}), repositories with high-confidence Cursor usage show stronger effects. Row 3: Robustness check (Section \ref{['sec:limitations']}), repositories where Cursor-file-tinkerers account for the majority of commit activity show slightly stronger effects. Row 4: Robustness check (Section \ref{['sec:limitations']}), repositories using Cursor show similar effects as those using Cursor and other agentic AI tools.
  • Figure 3: Our theory around how LLM agent assistants may impact software development. Solid lines show causal relationships supported by existing evidence, and dashed lines indicate relationships not fully supported in our data.