Table of Contents
Fetching ...

An Empirical Study of ChatGPT-Related Projects and Their Issues on GitHub

Zheng Lin, Neng Zhang, Chao Liu, Zibin Zheng

TL;DR

This study analyzes ChatGPT-related GitHub projects by building a fine-grained categorization and examining user-reported issues. It collects 71,244 projects and filters to the top 200 by stars, then applies LDA to 23,609 issues across three main project categories to derive ten issue topics and assess their popularity, difficulty, and evolution over time using statistical trend analyses. Key findings show category-specific differences in issue dynamics and prompt shifts in topic importance aligned with ChatGPT developments, providing practical guidance for platform tagging and development prioritization. The work offers concrete methods for organizing open-source ChatGPT projects and prioritizing user-reported issues to improve development efficiency and platform management.

Abstract

Since the launch of ChatGPT in 2022, an increasing number of ChatGPT-related projects are being published on GitHub, sparking widespread discussions. However, GitHub does not provide a detailed classification of these projects to help users effectively explore interested projects. Additionally, the issues raised by users for these projects cover various aspects, e.g., installation, usage, and updates. It would be valuable to help developers prioritize more urgent issues and improve development efficiency. We retrieved 71,244 projects from GitHub using the keyword `ChatGPT' and selected the top 200 representative projects with the highest numbers of stars as our dataset. By analyzing the project descriptions, we identified three primary categories of ChatGPT-related projects, namely ChatGPT Implementation & Training, ChatGPT Application, ChatGPT Improvement & Extension. Next, we applied a topic modeling technique to 23,609 issues of those projects and identified ten issue topics, e.g., model reply and interaction interface. We further analyzed the popularity, difficulty, and evolution of each issue topic within the three project categories. Our main findings are: 1) The increase in the number of projects within the three categories is closely related to the development of ChatGPT; and 2) There are significant differences in the popularity, difficulty, and evolutionary trends of the issue topics across the three project categories. Based on these findings, we finally provided implications for project developers and platform managers on how to better develop and manage ChatGPT-related projects.

An Empirical Study of ChatGPT-Related Projects and Their Issues on GitHub

TL;DR

This study analyzes ChatGPT-related GitHub projects by building a fine-grained categorization and examining user-reported issues. It collects 71,244 projects and filters to the top 200 by stars, then applies LDA to 23,609 issues across three main project categories to derive ten issue topics and assess their popularity, difficulty, and evolution over time using statistical trend analyses. Key findings show category-specific differences in issue dynamics and prompt shifts in topic importance aligned with ChatGPT developments, providing practical guidance for platform tagging and development prioritization. The work offers concrete methods for organizing open-source ChatGPT projects and prioritizing user-reported issues to improve development efficiency and platform management.

Abstract

Since the launch of ChatGPT in 2022, an increasing number of ChatGPT-related projects are being published on GitHub, sparking widespread discussions. However, GitHub does not provide a detailed classification of these projects to help users effectively explore interested projects. Additionally, the issues raised by users for these projects cover various aspects, e.g., installation, usage, and updates. It would be valuable to help developers prioritize more urgent issues and improve development efficiency. We retrieved 71,244 projects from GitHub using the keyword `ChatGPT' and selected the top 200 representative projects with the highest numbers of stars as our dataset. By analyzing the project descriptions, we identified three primary categories of ChatGPT-related projects, namely ChatGPT Implementation & Training, ChatGPT Application, ChatGPT Improvement & Extension. Next, we applied a topic modeling technique to 23,609 issues of those projects and identified ten issue topics, e.g., model reply and interaction interface. We further analyzed the popularity, difficulty, and evolution of each issue topic within the three project categories. Our main findings are: 1) The increase in the number of projects within the three categories is closely related to the development of ChatGPT; and 2) There are significant differences in the popularity, difficulty, and evolutionary trends of the issue topics across the three project categories. Based on these findings, we finally provided implications for project developers and platform managers on how to better develop and manage ChatGPT-related projects.
Paper Structure (23 sections, 5 equations, 4 figures, 13 tables)