Table of Contents
Fetching ...

Who is using AI to code? Global diffusion and impact of generative AI

Simone Daniotti, Johannes Wachs, Xiangnan Feng, Frank Neffke

TL;DR

The study constructs a large-scale, ground-truth–grounded detector for AI-generated Python code and applies it to millions of GitHub commits across six countries to map the diffusion and productivity impacts of genAI coding tools. Using GraphCodeBERT-based classification, country-specific corrections, and fixed-effects regressions, it finds that AI-generated code reached about $29\%$ of US Python functions by 2024, boosting quarterly commit activity by roughly $3.6\%$—driven mainly by experienced programmers—and encouraging exploration of new library domains. The paper also validates the detector on real-world and newer models, analyzes cross-country patterns, and estimates the broader economic value and potential welfare gains under different general-equilibrium scenarios, concluding that genAI’s impact is substantial but highly heterogeneous. These insights inform policymakers and researchers about diffusion barriers, the distributional consequences for skills, and the scale of productivity and innovation effects in software development.

Abstract

Generative coding tools promise big productivity gains, but uneven uptake could widen skill and income gaps. We train a neural classifier to spot AI-generated Python functions in over 30 million GitHub commits by 170,000 developers, tracking how fast -- and where -- these tools take hold. Today, AI writes an estimated 29% of Python functions in the US, a modest and shrinking lead over other countries. We estimate that quarterly output, measured in online code contributions, has increased by 3.6% because of this. Our evidence suggests that programmers using AI may also more readily expand into new domains of software development. However, experienced programmers capture nearly all of these productivity and exploration gains, widening rather than closing the skill gap.

Who is using AI to code? Global diffusion and impact of generative AI

TL;DR

The study constructs a large-scale, ground-truth–grounded detector for AI-generated Python code and applies it to millions of GitHub commits across six countries to map the diffusion and productivity impacts of genAI coding tools. Using GraphCodeBERT-based classification, country-specific corrections, and fixed-effects regressions, it finds that AI-generated code reached about of US Python functions by 2024, boosting quarterly commit activity by roughly —driven mainly by experienced programmers—and encouraging exploration of new library domains. The paper also validates the detector on real-world and newer models, analyzes cross-country patterns, and estimates the broader economic value and potential welfare gains under different general-equilibrium scenarios, concluding that genAI’s impact is substantial but highly heterogeneous. These insights inform policymakers and researchers about diffusion barriers, the distributional consequences for skills, and the scale of productivity and innovation effects in software development.

Abstract

Generative coding tools promise big productivity gains, but uneven uptake could widen skill and income gaps. We train a neural classifier to spot AI-generated Python functions in over 30 million GitHub commits by 170,000 developers, tracking how fast -- and where -- these tools take hold. Today, AI writes an estimated 29% of Python functions in the US, a modest and shrinking lead over other countries. We estimate that quarterly output, measured in online code contributions, has increased by 3.6% because of this. Our evidence suggests that programmers using AI may also more readily expand into new domains of software development. However, experienced programmers capture nearly all of these productivity and exploration gains, widening rather than closing the skill gap.

Paper Structure

This paper contains 25 sections, 15 equations, 12 figures, 16 tables.

Figures (12)

  • Figure 1: Classifying code from functions written in the Python programming language as human or AI generated. A) Using a collection of human generated code, we ask one LLM to describe the code in English, then another to implement that description as a Python function. B) We vectorize the resulting code using GraphCodeBert, an embedding method that uses a code’s tokens, comments, and variable graph flow. C) We train a neural network classifier combining GraphCodeBert with a classification head to predict the human/AI labels. D) We evaluate the classifier on out-of-sample data and apply it to a large database of unlabeled Python functions.
  • Figure 2: Share of AI-generated Python functions over time.A: share of Python functions that were created or substantially changed by GitHub users in the United States. Vertical lines depict 95% confidence intervals. The plot reveals abrupt shifts in adoption coinciding with key AI-related events: the release of GitHub Copilot Preview, the public launch of ChatGPT, and the second wave of LLM releases (GPT4 and related models). B: adoption in China, France, Germany, India and Russia (note that in China, GitHub competes with the alternative collaboration platform, Giteegortmaker2024open). We sampled 2,000 random programmers per country-year. The US curve is replicated from panel A as point of reference. The US lead the early adoption of genAI, followed by European nations such as France and Germany. From 2023 on, India rapidly catches up, whereas adoption in China and Russia progresses more slowly.
  • Figure 3: A) Intensity of genAI use by gender inferred from GitHub display names (US, 2024). B) Intensity of genAI use by user tenure (US, 2024). C) Estimated effect of genAI use on user activity from a user-quarter panel regression with user and quarter fixed-effects. GenAI use is associated with increased commit activity across all commits, multi-file commits ("Multi-File") that navigate project interdependencies, and commits adding library imports ("Imports"), which we interpret as adding new features. GenAI is also associated with wider ranges of individual libraries ("Indiv. Libs") and library combinations ("Combos"), and increased experimentation with new libraries or combinations. Results hold subsetting on the 5,000 most common library combinations ("Combos (Top 5k)") and coarsened library categories ("Combos (Groups)"). Error bars: 95% confidence intervals (standard errors clustered by user).
  • Figure S1: Location of GitHub users.A: Self-reported locations of GitHub users. B: Number of users in each country based on self-reported locations against IP addresses.
  • Figure S2: Detector Prediction Test. Evaluation of the trained detector on a test set. A: predicted probability that code was AI-generated for human-generated functions. B: predicted probability that code was AI-generated for AI-generated functions. C: Loss curve during the training of the detection model. D: ROC Curve of the classifier.
  • ...and 7 more figures