Table of Contents
Fetching ...

Developer Productivity With and Without GitHub Copilot: A Longitudinal Mixed-Methods Case Study

Viktoria Stray, Elias Goldmann Brandtzæg, Viggo Tellefsen Wivestad, Astri Barbala, Nils Brede Moe

TL;DR

This study investigates the real-world impact of GitHub Copilot on developer activity and perceived productivity within NAV IT over two years using a longitudinal mixed-methods design. It combines 26,317 commits across 703 repositories with surveys and 13 interviews to compare 25 Copilot users to 14 non-users, revealing a pre-existing gap in activity favoring Copilot adopters and only modest post-adoption changes in commit frequency. Importantly, perceived productivity tended to increase for Copilot users even when objective commit-based metrics remained stable, highlighting a discrepancy between subjective experience and measurable output. The findings suggest that GenAI tools may primarily reduce cognitive load and enhance workflow rather than dramatically boosting raw code production, and they call for broader, multi-dimensional productivity metrics that capture developer well-being and flow in addition to output.

Abstract

This study investigates the real-world impact of the generative AI (GenAI) tool GitHub Copilot on developer activity and perceived productivity. We conducted a mixed-methods case study in NAV IT, a large public sector agile organization. We analyzed 26,317 unique non-merge commits from 703 of NAV IT's GitHub repositories over a two-year period, focusing on commit-based activity metrics from 25 Copilot users and 14 non-users. The analysis was complemented by survey responses on their roles and perceived productivity, as well as 13 interviews. Our analysis of activity metrics revealed that individuals who used Copilot were consistently more active than non-users, even prior to Copilot's introduction. We did not find any statistically significant changes in commit-based activity for Copilot users after they adopted the tool, although minor increases were observed. This suggests a discrepancy between changes in commit-based metrics and the subjective experience of productivity.

Developer Productivity With and Without GitHub Copilot: A Longitudinal Mixed-Methods Case Study

TL;DR

This study investigates the real-world impact of GitHub Copilot on developer activity and perceived productivity within NAV IT over two years using a longitudinal mixed-methods design. It combines 26,317 commits across 703 repositories with surveys and 13 interviews to compare 25 Copilot users to 14 non-users, revealing a pre-existing gap in activity favoring Copilot adopters and only modest post-adoption changes in commit frequency. Importantly, perceived productivity tended to increase for Copilot users even when objective commit-based metrics remained stable, highlighting a discrepancy between subjective experience and measurable output. The findings suggest that GenAI tools may primarily reduce cognitive load and enhance workflow rather than dramatically boosting raw code production, and they call for broader, multi-dimensional productivity metrics that capture developer well-being and flow in addition to output.

Abstract

This study investigates the real-world impact of the generative AI (GenAI) tool GitHub Copilot on developer activity and perceived productivity. We conducted a mixed-methods case study in NAV IT, a large public sector agile organization. We analyzed 26,317 unique non-merge commits from 703 of NAV IT's GitHub repositories over a two-year period, focusing on commit-based activity metrics from 25 Copilot users and 14 non-users. The analysis was complemented by survey responses on their roles and perceived productivity, as well as 13 interviews. Our analysis of activity metrics revealed that individuals who used Copilot were consistently more active than non-users, even prior to Copilot's introduction. We did not find any statistically significant changes in commit-based activity for Copilot users after they adopted the tool, although minor increases were observed. This suggests a discrepancy between changes in commit-based metrics and the subjective experience of productivity.

Paper Structure

This paper contains 11 sections, 4 figures.

Figures (4)

  • Figure 1: Self-reported roles of the 39 employees whose GitHub activity was analyzed.
  • Figure 2: Time series showing the average weekly commit activity among GitHub users and non-users. The top plot shows net lines changed (lines added - lines removed), while the lower plot shows the average commit frequency (i.e., the average number of commits per week). The red vertical line shows when GitHub Copilot was introduced in the organization. The shaded areas around each line represent the 95% confidence intervals.
  • Figure 3: Average weekly commit contributions for non-users and Copilot users for the periods before and after Copilot adoption. The error bars show the 95% confidence interval for all the weeks in each period.
  • Figure 4: Correlations between change in commits and perceived productivity. Y-axis represents the 5-point Likert scale responses for perceived change in productivity. X-axis shows the change in average weekly activity.