Table of Contents
Fetching ...

CPR: Mitigating Large Language Model Hallucinations with Curative Prompt Refinement

Jung-Woo Shim, Yeong-Joon Ju, Ji-Hoon Park, Seong-Whan Lee

TL;DR

The paper addresses the problem of hallucinations in large language models caused by ill-formed prompts. It introduces Curative Prompt Refinement (CPR), a plug-and-play pipeline that uses a fine-tuned small language model (via LoRA) to clean prompts and generate informative task descriptions, followed by perplexity-based reranking to assemble a final, well-formed prompt. The authors provide a dataset and training regime for prompt refinement, demonstrate improved output quality and reduced hallucinations across multiple LLMs, and show competitive performance against SelfCheckGPT, especially on highly ill-formed prompts. The work emphasizes a lightweight, model-agnostic preprocessing approach that can be adopted across diverse inference settings to enhance reliability without requiring external knowledge resources. Overall, CPR advances practical LLM reliability by focusing on input quality and informative prompt enrichment rather than solely on model internals or post-hoc corrections.

Abstract

Recent advancements in large language models (LLMs) highlight their fluency in generating responses to diverse prompts. However, these models sometimes generate plausible yet incorrect ``hallucinated" facts, undermining trust. A frequent but often overlooked cause of such errors is the use of poorly structured or vague prompts by users, leading LLMs to base responses on assumed rather than actual intentions. To mitigate hallucinations induced by these ill-formed prompts, we introduce Curative Prompt Refinement (CPR), a plug-and-play framework for curative prompt refinement that 1) cleans ill-formed prompts, and 2) generates additional informative task descriptions to align the intention of the user and the prompt using a fine-tuned small language model. When applied to language models, we discover that CPR significantly increases the quality of generation while also mitigating hallucination. Empirical studies show that prompts with CPR applied achieves over a 90\% win rate over the original prompts without any external knowledge.

CPR: Mitigating Large Language Model Hallucinations with Curative Prompt Refinement

TL;DR

The paper addresses the problem of hallucinations in large language models caused by ill-formed prompts. It introduces Curative Prompt Refinement (CPR), a plug-and-play pipeline that uses a fine-tuned small language model (via LoRA) to clean prompts and generate informative task descriptions, followed by perplexity-based reranking to assemble a final, well-formed prompt. The authors provide a dataset and training regime for prompt refinement, demonstrate improved output quality and reduced hallucinations across multiple LLMs, and show competitive performance against SelfCheckGPT, especially on highly ill-formed prompts. The work emphasizes a lightweight, model-agnostic preprocessing approach that can be adopted across diverse inference settings to enhance reliability without requiring external knowledge resources. Overall, CPR advances practical LLM reliability by focusing on input quality and informative prompt enrichment rather than solely on model internals or post-hoc corrections.

Abstract

Recent advancements in large language models (LLMs) highlight their fluency in generating responses to diverse prompts. However, these models sometimes generate plausible yet incorrect ``hallucinated" facts, undermining trust. A frequent but often overlooked cause of such errors is the use of poorly structured or vague prompts by users, leading LLMs to base responses on assumed rather than actual intentions. To mitigate hallucinations induced by these ill-formed prompts, we introduce Curative Prompt Refinement (CPR), a plug-and-play framework for curative prompt refinement that 1) cleans ill-formed prompts, and 2) generates additional informative task descriptions to align the intention of the user and the prompt using a fine-tuned small language model. When applied to language models, we discover that CPR significantly increases the quality of generation while also mitigating hallucination. Empirical studies show that prompts with CPR applied achieves over a 90\% win rate over the original prompts without any external knowledge.

Paper Structure

This paper contains 18 sections, 2 figures, 4 tables, 1 algorithm.

Figures (2)

  • Figure 1: An example of our framework of refining prompts and generating informative task descriptions. With the ill-formed prompt as the input, the GPT-3.5 API generates a hallucinatory output whereas the refined prompt with informative task descriptions generate a high quality response.
  • Figure 2: Overview of Curative Prompt Refinement (CPR). First, we fine-tune an SLM on our constructed dataset using LoRA, to mitigate computational burdens in fine-tuning. We then utilize the fine-tuned SLM to (a) refine ill-formed user prompts into prompts without any grammatical errors. Following the cleaning process, we (b) generate descriptions of the corresponding prompt, and to maximize the information of the prompts, we (c) use a reranking method to prioritize the most relevant descriptions based perplexity. The well-formed prompt allows the inference LLM to generate a concise response.