Table of Contents
Fetching ...

Towards Detecting Prompt Knowledge Gaps for Improved LLM-guided Issue Resolution

Ramtin Ehsani, Sakshi Pathak, Preetha Chatterjee

TL;DR

This paper addresses the problem that prompt knowledge gaps degrade LLM-driven issue resolution. It analyzes 433 developer-ChatGPT conversations from GitHub to identify four gap categories and seven conversation styles, derives three automatic heuristics (Specificity, Contextual Richness, Clarity), and validates a browser-extension prototype for gap detection. The study contributes a manually annotated dataset, empirical associations between gaps and resolution outcomes, and a feasible tooling approach for real-time prompt improvement. The findings suggest that rich, specific, and clear prompts with contextual code and references improve the likelihood of closing issues, offering practical impact for developers seeking to optimize LLM-assisted debugging and issue resolution.

Abstract

Large language models (LLMs) have become essential in software development, especially for issue resolution. However, despite their widespread use, significant challenges persist in the quality of LLM responses to issue resolution queries. LLM interactions often yield incorrect, incomplete, or ambiguous information, largely due to knowledge gaps in prompt design, which can lead to unproductive exchanges and reduced developer productivity. In this paper, we analyze 433 developer-ChatGPT conversations within GitHub issue threads to examine the impact of prompt knowledge gaps and conversation styles on issue resolution. We identify four main knowledge gaps in developer prompts: Missing Context, Missing Specifications, Multiple Context, and Unclear Instructions. Assuming that conversations within closed issues contributed to successful resolutions while those in open issues did not, we find that ineffective conversations contain knowledge gaps in 44.6% of prompts, compared to only 12.6% in effective ones. Additionally, we observe seven distinct conversational styles, with Directive Prompting, Chain of Thought, and Responsive Feedback being the most prevalent. We find that knowledge gaps are present in all styles of conversations, with Missing Context being the most repeated challenge developers face in issue-resolution conversations. Based on our analysis, we identify key textual and code-related heuristics (Specificity, Contextual Richness, and Clarity) that are associated with successful issue closure and help assess prompt quality. These heuristics lay the foundation for an automated tool that can dynamically flag unclear prompts and suggest structured improvements. To test feasibility, we developed a lightweight browser extension prototype for detecting prompt gaps, that can be easily adapted to other tools within developer workflows.

Towards Detecting Prompt Knowledge Gaps for Improved LLM-guided Issue Resolution

TL;DR

This paper addresses the problem that prompt knowledge gaps degrade LLM-driven issue resolution. It analyzes 433 developer-ChatGPT conversations from GitHub to identify four gap categories and seven conversation styles, derives three automatic heuristics (Specificity, Contextual Richness, Clarity), and validates a browser-extension prototype for gap detection. The study contributes a manually annotated dataset, empirical associations between gaps and resolution outcomes, and a feasible tooling approach for real-time prompt improvement. The findings suggest that rich, specific, and clear prompts with contextual code and references improve the likelihood of closing issues, offering practical impact for developers seeking to optimize LLM-assisted debugging and issue resolution.

Abstract

Large language models (LLMs) have become essential in software development, especially for issue resolution. However, despite their widespread use, significant challenges persist in the quality of LLM responses to issue resolution queries. LLM interactions often yield incorrect, incomplete, or ambiguous information, largely due to knowledge gaps in prompt design, which can lead to unproductive exchanges and reduced developer productivity. In this paper, we analyze 433 developer-ChatGPT conversations within GitHub issue threads to examine the impact of prompt knowledge gaps and conversation styles on issue resolution. We identify four main knowledge gaps in developer prompts: Missing Context, Missing Specifications, Multiple Context, and Unclear Instructions. Assuming that conversations within closed issues contributed to successful resolutions while those in open issues did not, we find that ineffective conversations contain knowledge gaps in 44.6% of prompts, compared to only 12.6% in effective ones. Additionally, we observe seven distinct conversational styles, with Directive Prompting, Chain of Thought, and Responsive Feedback being the most prevalent. We find that knowledge gaps are present in all styles of conversations, with Missing Context being the most repeated challenge developers face in issue-resolution conversations. Based on our analysis, we identify key textual and code-related heuristics (Specificity, Contextual Richness, and Clarity) that are associated with successful issue closure and help assess prompt quality. These heuristics lay the foundation for an automated tool that can dynamically flag unclear prompts and suggest structured improvements. To test feasibility, we developed a lightweight browser extension prototype for detecting prompt gaps, that can be easily adapted to other tools within developer workflows.
Paper Structure (15 sections, 6 figures, 4 tables)

This paper contains 15 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Example of Open vs. Closed Conversations: Closed conversation provides Context and Specifications to ChatGPT vs. the missing Context in open conversation lead ChatGPT to hallucinate.
  • Figure 2: Styles of Conversations in Closed Vs. Open Issues
  • Figure 3: Multiple Context in a Conversation Linked to an Open Issue
  • Figure 4: Progression of Conversations with Prompt Knowledge Gaps
  • Figure 5: Impact of Features on Model's Outcome Based on SHAP
  • ...and 1 more figures