Table of Contents
Fetching ...

De-Hallucinator: Mitigating LLM Hallucinations in Code Generation Tasks via Iterative Grounding

Aryaz Eghbali, Michael Pradel

TL;DR

This paper tackles the problem of LLM hallucinations in code generation due to lacking project-specific API knowledge. It introduces De-Hallucinator, a grounding approach that combines retrieval-augmented generation with an iterative prompting loop to incorporate project API references derived from both static pre-analysis and the model’s own predictions. Across Python and JavaScript tasks, and five state-of-the-art LLMs, the method yields significant improvements in edit distance and API recall for code completion, as well as higher test success and coverage for test generation. The approach is model-agnostic, scalable, and designed to be practical for real-world IDE workflows, offering a systematic way to ground AI-assisted coding with contextual API grounding. Overall, the work demonstrates that iterative, reference-grounded prompts can substantially reduce hallucinations while maintaining prompt efficiency and deployment practicality.

Abstract

Large language models (LLMs) trained on datasets of publicly available source code have established a new state of the art in code generation tasks. However, these models are mostly unaware of the code that exists within a specific project, preventing the models from making good use of existing APIs. Instead, LLMs often invent, or "hallucinate", non-existent APIs or produce variants of already existing code. This paper presents De-Hallucinator, a technique that grounds the predictions of an LLM through a novel combination of retrieving suitable API references and iteratively querying the model with increasingly suitable context information in the prompt. The approach exploits the observation that predictions by LLMs often resemble the desired code, but they fail to correctly refer to already existing APIs. De-Hallucinator automatically identifies project-specific API references related to the model's initial predictions and adds these references into the prompt. Unlike retrieval-augmented generation (RAG), our approach uses the initial prediction(s) by the model to iteratively retrieve increasingly suitable API references. Our evaluation applies the approach to two tasks: predicting API usages in Python and generating tests in JavaScript. We show that De-Hallucinator consistently improves the generated code across five LLMs. In particular, the approach improves the edit distance by 23.3-50.6% and the recall of correctly predicted API usages by 23.9-61.0% for code completion, and improves the number of fixed tests that initially failed because of hallucinations by 63.2%, resulting in a 15.5% increase in statement coverage for test generation.

De-Hallucinator: Mitigating LLM Hallucinations in Code Generation Tasks via Iterative Grounding

TL;DR

This paper tackles the problem of LLM hallucinations in code generation due to lacking project-specific API knowledge. It introduces De-Hallucinator, a grounding approach that combines retrieval-augmented generation with an iterative prompting loop to incorporate project API references derived from both static pre-analysis and the model’s own predictions. Across Python and JavaScript tasks, and five state-of-the-art LLMs, the method yields significant improvements in edit distance and API recall for code completion, as well as higher test success and coverage for test generation. The approach is model-agnostic, scalable, and designed to be practical for real-world IDE workflows, offering a systematic way to ground AI-assisted coding with contextual API grounding. Overall, the work demonstrates that iterative, reference-grounded prompts can substantially reduce hallucinations while maintaining prompt efficiency and deployment practicality.

Abstract

Large language models (LLMs) trained on datasets of publicly available source code have established a new state of the art in code generation tasks. However, these models are mostly unaware of the code that exists within a specific project, preventing the models from making good use of existing APIs. Instead, LLMs often invent, or "hallucinate", non-existent APIs or produce variants of already existing code. This paper presents De-Hallucinator, a technique that grounds the predictions of an LLM through a novel combination of retrieving suitable API references and iteratively querying the model with increasingly suitable context information in the prompt. The approach exploits the observation that predictions by LLMs often resemble the desired code, but they fail to correctly refer to already existing APIs. De-Hallucinator automatically identifies project-specific API references related to the model's initial predictions and adds these references into the prompt. Unlike retrieval-augmented generation (RAG), our approach uses the initial prediction(s) by the model to iteratively retrieve increasingly suitable API references. Our evaluation applies the approach to two tasks: predicting API usages in Python and generating tests in JavaScript. We show that De-Hallucinator consistently improves the generated code across five LLMs. In particular, the approach improves the edit distance by 23.3-50.6% and the recall of correctly predicted API usages by 23.9-61.0% for code completion, and improves the number of fixed tests that initially failed because of hallucinations by 63.2%, resulting in a 15.5% increase in statement coverage for test generation.
Paper Structure (56 sections, 1 equation, 8 figures, 5 tables)

This paper contains 56 sections, 1 equation, 8 figures, 5 tables.

Figures (8)

  • Figure 1: The desired completion of search is highlighted in gray.
  • Figure 2: The completion of search by CodeGen-2B-mono highlighted in gray, and the wrong API usage highlighted in red.
  • Figure 3: Overview of De-Hallu-cina-tor.
  • Figure 4: Step-by-step progression of De-Hallu-cina-tor on the example in \ref{['fig:DataStore']}.
  • Figure 5: Completion by CodeGen highlighted in red, the ground truth, highlighted in green, and the completion by De-Hallu-cina-tor after augmenting the prompt with relevant APIs highlighted in blue.
  • ...and 3 more figures

Theorems & Definitions (1)

  • definition 1: API reference