Table of Contents
Fetching ...

Synergizing LLMs and Knowledge Graphs: A Novel Approach to Software Repository-Related Question Answering

Samuel Abedu, SayedHassan Khatoonabadi, Emad Shihab

TL;DR

This work tackles the challenge of answering software-repository questions by grounding a large language model (LLM) in a knowledge graph (KG) constructed from Git metadata (Users, Commits, Issues, Files). The system ingests repository data into a Neo4j KG, uses GPT-4o to translate natural language queries into Cypher, executes them, and then generates natural-language answers, with a prompt design that guards against hallucinations. Initial experiments show 65% end-to-end accuracy, which improves to 84% when few-shot chain-of-thought prompting is employed, especially for complex, multi-hop questions. A task-based user study confirms substantial gains in task completion, accuracy, and time savings, indicating that KG-grounded LLMs can make repository data more accessible to both technical and non-technical stakeholders. The work also benchmarks against MSRBot and GPT-4o-search-preview, demonstrating superior performance and highlighting future directions in scalable, reasoning-enhanced repository QA.

Abstract

Software repositories contain valuable information for understanding the development process. However, extracting insights from repository data is time-consuming and requires technical expertise. While software engineering chatbots support natural language interactions with repositories, chatbots struggle to understand questions beyond their trained intents and to accurately retrieve the relevant data. This study aims to improve the accuracy of LLM-based chatbots in answering repository-related questions by augmenting them with knowledge graphs. We use a two-step approach: constructing a knowledge graph from repository data, and synergizing the knowledge graph with an LLM to handle natural language questions and answers. We curated 150 questions of varying complexity and evaluated the approach on five popular open-source projects. Our initial results revealed the limitations of the approach, with most errors due to the reasoning ability of the LLM. We therefore applied few-shot chain-of-thought prompting, which improved accuracy to 84%. We also compared against baselines (MSRBot and GPT-4o-search-preview), and our approach performed significantly better. In a task-based user study with 20 participants, users completed more tasks correctly and in less time with our approach, and they reported that it was useful. Our findings demonstrate that LLMs and knowledge graphs are a viable solution for making repository data accessible.

Synergizing LLMs and Knowledge Graphs: A Novel Approach to Software Repository-Related Question Answering

TL;DR

This work tackles the challenge of answering software-repository questions by grounding a large language model (LLM) in a knowledge graph (KG) constructed from Git metadata (Users, Commits, Issues, Files). The system ingests repository data into a Neo4j KG, uses GPT-4o to translate natural language queries into Cypher, executes them, and then generates natural-language answers, with a prompt design that guards against hallucinations. Initial experiments show 65% end-to-end accuracy, which improves to 84% when few-shot chain-of-thought prompting is employed, especially for complex, multi-hop questions. A task-based user study confirms substantial gains in task completion, accuracy, and time savings, indicating that KG-grounded LLMs can make repository data more accessible to both technical and non-technical stakeholders. The work also benchmarks against MSRBot and GPT-4o-search-preview, demonstrating superior performance and highlighting future directions in scalable, reasoning-enhanced repository QA.

Abstract

Software repositories contain valuable information for understanding the development process. However, extracting insights from repository data is time-consuming and requires technical expertise. While software engineering chatbots support natural language interactions with repositories, chatbots struggle to understand questions beyond their trained intents and to accurately retrieve the relevant data. This study aims to improve the accuracy of LLM-based chatbots in answering repository-related questions by augmenting them with knowledge graphs. We use a two-step approach: constructing a knowledge graph from repository data, and synergizing the knowledge graph with an LLM to handle natural language questions and answers. We curated 150 questions of varying complexity and evaluated the approach on five popular open-source projects. Our initial results revealed the limitations of the approach, with most errors due to the reasoning ability of the LLM. We therefore applied few-shot chain-of-thought prompting, which improved accuracy to 84%. We also compared against baselines (MSRBot and GPT-4o-search-preview), and our approach performed significantly better. In a task-based user study with 20 participants, users completed more tasks correctly and in less time with our approach, and they reported that it was useful. Our findings demonstrate that LLMs and knowledge graphs are a viable solution for making repository data accessible.

Paper Structure

This paper contains 43 sections, 2 equations, 8 figures, 25 tables.

Figures (8)

  • Figure 1: Overview of our approach in answering software repository-related questions by synergizing LLMs and knowledge graphs.
  • Figure 2: Overview of the schema of the knowledge graph used in this study. The circles represent the entities (Nodes), the directed arrows represent the relationships (Edges), and the boxes show the attributes.
  • Figure 3: Prompt template used by the Query Generator LLM. The prompt includes the current date and time, the schema of the knowledge graph, and the user's question.
  • Figure 4: Prompt template used by the Response Generator LLM. The prompt includes the schema of the knowledge graph, the generated Cypher query for the question, the results returned from executing the Cypher query, and the question.
  • Figure 5: Prompt template for the few-shot chain-of-thought. The prompt includes the current date and time, the schema of the knowledge graph, the chain-of-thought examples, and the user's question.
  • ...and 3 more figures