AIDev: Studying AI Coding Agents on GitHub
Hao Li, Haoxiang Zhang, Ahmed E. Hassan
TL;DR
AIDev tackles the lack of large-scale real-world data on AI coding agents by assembling 932,791 Agentic-PRs from real GitHub projects across 116,211 repositories and 72,189 developers, authored by five agents and augmented with a curated subset containing rich review, commit, and timeline data. The dataset enables systematic study of adoption, code quality, review dynamics, and risk in AI-assisted software engineering, with a relational schema and automated PR-purpose annotations to support cross-project analyses. Access is provided via Hugging Face and Zenodo, with interactive exploration, reproducible notebooks, and a companion repo for pipelines. By linking PR metadata with reviews, commits, issues, and timelines, AIDev supports empirical investigation of how AI teammates integrate into real development workflows and what factors drive successful collaboration, reliability, and productivity gains in the wild.
Abstract
AI coding agents are rapidly transforming software engineering by performing tasks such as feature development, debugging, and testing. Despite their growing impact, the research community lacks a comprehensive dataset capturing how these agents are used in real-world projects. To address this gap, we introduce AIDev, a large-scale dataset focused on agent-authored pull requests (Agentic-PRs) in real-world GitHub repositories. AIDev aggregates 932,791 Agentic-PRs produced by five agents: OpenAI Codex, Devin, GitHub Copilot, Cursor, and Claude Code. These PRs span 116,211 repositories and involve 72,189 developers. In addition, AIDev includes a curated subset of 33,596 Agentic-PRs from 2,807 repositories with over 100 stars, providing further information such as comments, reviews, commits, and related issues. This dataset offers a foundation for future research on AI adoption, developer productivity, and human-AI collaboration in the new era of software engineering. > AI Agent, Agentic AI, Coding Agent, Agentic Coding, Agentic Software Engineering, Agentic Engineering
