Table of Contents
Fetching ...

Demystifying Issues, Causes and Solutions in LLM Open-Source Projects

Yangxiao Cai, Peng Liang, Yifei Wang, Zengyang Li, Mojtaba Shahin

TL;DR

This study provides a practitioner-centered taxonomy of issues in LLM open-source software by analyzing 994 closed GitHub issues across 15 projects. It finds Model Issue to be the most frequent problem, driven mainly by Model Problems, Configuration and Connection Problems, and Feature/Method Problems, with Optimize Model as the primary remedy. The authors build a two-tier taxonomy for issues and map them to underlying causes and solutions, yielding actionable guidance for users, developers, and researchers. Key contributions include a large, labeled dataset of issues, a structured taxonomy, and mappings that reveal how model and parameter considerations drive OSS challenges. The findings have practical implications for prompting strategies, development frameworks, configuration practices, and automated tuning and benchmarking in LLM OSS contexts.

Abstract

With the advancements of Large Language Models (LLMs), an increasing number of open-source software projects are using LLMs as their core functional component. Although research and practice on LLMs are capturing considerable interest, no dedicated studies explored the challenges faced by practitioners of LLM open-source projects, the causes of these challenges, and potential solutions. To fill this research gap, we conducted an empirical study to understand the issues that practitioners encounter when developing and using LLM open-source software, the possible causes of these issues, and potential solutions. We collected all closed issues from 15 LLM open-source projects and labelled issues that met our requirements. We then randomly selected 994 issues from the labelled issues as the sample for data extraction and analysis to understand the prevalent issues, their underlying causes, and potential solutions. Our study results show that (1) Model Issue is the most common issue faced by practitioners, (2) Model Problem, Configuration and Connection Problem, and Feature and Method Problem are identified as the most frequent causes of the issues, and (3) Optimize Model is the predominant solution to the issues. Based on the study results, we provide implications for practitioners and researchers of LLM open-source projects.

Demystifying Issues, Causes and Solutions in LLM Open-Source Projects

TL;DR

This study provides a practitioner-centered taxonomy of issues in LLM open-source software by analyzing 994 closed GitHub issues across 15 projects. It finds Model Issue to be the most frequent problem, driven mainly by Model Problems, Configuration and Connection Problems, and Feature/Method Problems, with Optimize Model as the primary remedy. The authors build a two-tier taxonomy for issues and map them to underlying causes and solutions, yielding actionable guidance for users, developers, and researchers. Key contributions include a large, labeled dataset of issues, a structured taxonomy, and mappings that reveal how model and parameter considerations drive OSS challenges. The findings have practical implications for prompting strategies, development frameworks, configuration practices, and automated tuning and benchmarking in LLM OSS contexts.

Abstract

With the advancements of Large Language Models (LLMs), an increasing number of open-source software projects are using LLMs as their core functional component. Although research and practice on LLMs are capturing considerable interest, no dedicated studies explored the challenges faced by practitioners of LLM open-source projects, the causes of these challenges, and potential solutions. To fill this research gap, we conducted an empirical study to understand the issues that practitioners encounter when developing and using LLM open-source software, the possible causes of these issues, and potential solutions. We collected all closed issues from 15 LLM open-source projects and labelled issues that met our requirements. We then randomly selected 994 issues from the labelled issues as the sample for data extraction and analysis to understand the prevalent issues, their underlying causes, and potential solutions. Our study results show that (1) Model Issue is the most common issue faced by practitioners, (2) Model Problem, Configuration and Connection Problem, and Feature and Method Problem are identified as the most frequent causes of the issues, and (3) Optimize Model is the predominant solution to the issues. Based on the study results, we provide implications for practitioners and researchers of LLM open-source projects.
Paper Structure (47 sections, 3 figures, 8 tables)

This paper contains 47 sections, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Overview of the research process
  • Figure 2: The distribution of the duration of the closed issues in the final data sample
  • Figure 3: Taxonomy of issues of LLM open-source projects