Table of Contents
Fetching ...

DevGPT: Studying Developer-ChatGPT Conversations

Tao Xiao, Christoph Treude, Hideaki Hata, Kenichi Matsumoto

TL;DR

DevGPT tackles the gap in understanding how software developers interact with ChatGPT by providing a large-scale dataset of publicly shared conversations linked to real development artifacts. It compiles 29,778 prompts and responses, including 19,106 code snippets, from conversations on GitHub and Hacker News across nine snapshots in 2023, tying them to source code, commits, issues, PRs, discussions, and threads. This resource enables analyses of developer queries, ChatGPT’s effectiveness in code generation and problem solving, and the broader impact of AI-assisted programming on software engineering workflows. By enabling exploration of prompt structures, code quality, reuse, and cross-artifact evolution, DevGPT supports researchers and practitioners in improving LLM-assisted development practices and tools.

Abstract

This paper introduces DevGPT, a dataset curated to explore how software developers interact with ChatGPT, a prominent large language model (LLM). The dataset encompasses 29,778 prompts and responses from ChatGPT, including 19,106 code snippets, and is linked to corresponding software development artifacts such as source code, commits, issues, pull requests, discussions, and Hacker News threads. This comprehensive dataset is derived from shared ChatGPT conversations collected from GitHub and Hacker News, providing a rich resource for understanding the dynamics of developer interactions with ChatGPT, the nature of their inquiries, and the impact of these interactions on their work. DevGPT enables the study of developer queries, the effectiveness of ChatGPT in code generation and problem solving, and the broader implications of AI-assisted programming. By providing this dataset, the paper paves the way for novel research avenues in software engineering, particularly in understanding and improving the use of LLMs like ChatGPT by developers.

DevGPT: Studying Developer-ChatGPT Conversations

TL;DR

DevGPT tackles the gap in understanding how software developers interact with ChatGPT by providing a large-scale dataset of publicly shared conversations linked to real development artifacts. It compiles 29,778 prompts and responses, including 19,106 code snippets, from conversations on GitHub and Hacker News across nine snapshots in 2023, tying them to source code, commits, issues, PRs, discussions, and threads. This resource enables analyses of developer queries, ChatGPT’s effectiveness in code generation and problem solving, and the broader impact of AI-assisted programming on software engineering workflows. By enabling exploration of prompt structures, code quality, reuse, and cross-artifact evolution, DevGPT supports researchers and practitioners in improving LLM-assisted development practices and tools.

Abstract

This paper introduces DevGPT, a dataset curated to explore how software developers interact with ChatGPT, a prominent large language model (LLM). The dataset encompasses 29,778 prompts and responses from ChatGPT, including 19,106 code snippets, and is linked to corresponding software development artifacts such as source code, commits, issues, pull requests, discussions, and Hacker News threads. This comprehensive dataset is derived from shared ChatGPT conversations collected from GitHub and Hacker News, providing a rich resource for understanding the dynamics of developer interactions with ChatGPT, the nature of their inquiries, and the impact of these interactions on their work. DevGPT enables the study of developer queries, the effectiveness of ChatGPT in code generation and problem solving, and the broader implications of AI-assisted programming. By providing this dataset, the paper paves the way for novel research avenues in software engineering, particularly in understanding and improving the use of LLMs like ChatGPT by developers.
Paper Structure (8 sections, 1 figure, 1 table)

This paper contains 8 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Example of a ChatGPT conversation in the context of a GitHub pull request