Table of Contents
Fetching ...

Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses

Juyeon Kim, Jeongeun Lee, Yoonho Chang, Chanyeol Choi, Junseong Kim, Jy-yong Sohn

TL;DR

Re-Ex tackles LLM hallucinations by post-editing outputs through a three-step pipeline: external evidence retrieval, factual error explanation, and revision based on explanations. The approach emphasizes an explanation step to guide revision, showing superior detection and revision performance with lower latency and fewer tokens than strong baselines across GPT-3.5 and GPT-4 on long-form datasets. Ablation studies confirm the necessity of evidence retrieval and explanation, and the method benefits from external sources over internal knowledge. Collectively, Re-Ex offers a practical, efficient pathway to more trustworthy LLM-generated content with broad implications for deployment in real-world applications.

Abstract

Mitigating hallucination issues is a key challenge that must be overcome to reliably deploy large language models (LLMs) in real-world scenarios. Recently, various methods have been proposed to detect and revise factual errors in LLM-generated texts, in order to reduce hallucination. In this paper, we propose Re-Ex, a method for post-editing LLM-generated responses. Re-Ex introduces a novel reasoning step dubbed as the factual error explanation step. Re-Ex revises the initial response of LLMs using 3-steps : first, external tools are used to retrieve the evidences of the factual errors in the initial LLM response; next, LLM is instructed to explain the problematic parts of the response based on the gathered evidence; finally, LLM revises the initial response using the explanations provided in the previous step. In addition to the explanation step, Re-Ex also incorporates new prompting techniques to reduce the token count and inference time required for the response revision process. Compared with existing methods including FacTool, CoVE, and RARR, Re-Ex provides better detection and revision performance with less inference time and fewer tokens in multiple benchmarks.

Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses

TL;DR

Re-Ex tackles LLM hallucinations by post-editing outputs through a three-step pipeline: external evidence retrieval, factual error explanation, and revision based on explanations. The approach emphasizes an explanation step to guide revision, showing superior detection and revision performance with lower latency and fewer tokens than strong baselines across GPT-3.5 and GPT-4 on long-form datasets. Ablation studies confirm the necessity of evidence retrieval and explanation, and the method benefits from external sources over internal knowledge. Collectively, Re-Ex offers a practical, efficient pathway to more trustworthy LLM-generated content with broad implications for deployment in real-world applications.

Abstract

Mitigating hallucination issues is a key challenge that must be overcome to reliably deploy large language models (LLMs) in real-world scenarios. Recently, various methods have been proposed to detect and revise factual errors in LLM-generated texts, in order to reduce hallucination. In this paper, we propose Re-Ex, a method for post-editing LLM-generated responses. Re-Ex introduces a novel reasoning step dubbed as the factual error explanation step. Re-Ex revises the initial response of LLMs using 3-steps : first, external tools are used to retrieve the evidences of the factual errors in the initial LLM response; next, LLM is instructed to explain the problematic parts of the response based on the gathered evidence; finally, LLM revises the initial response using the explanations provided in the previous step. In addition to the explanation step, Re-Ex also incorporates new prompting techniques to reduce the token count and inference time required for the response revision process. Compared with existing methods including FacTool, CoVE, and RARR, Re-Ex provides better detection and revision performance with less inference time and fewer tokens in multiple benchmarks.
Paper Structure (30 sections, 1 equation, 2 figures, 14 tables)

This paper contains 30 sections, 1 equation, 2 figures, 14 tables.

Figures (2)

  • Figure 1: An example of revising the response with Re-Ex. The initial response $R_{\text{initial}}$ states the US has 94 operating reactors, proven inaccurate by search result $A_1$. Re-Exexplains the factual error in $E_1$ and revises the response accordingly.
  • Figure 2: Overview of Re-Ex, which revises the initial response $R_{\text{initial}}$ of LLMs. First, Re-Ex retrieves the evidences for factual errors in $R_{\text{initial}}$ by using external tools; it first generates the sub-questions $\{Q_i\}_{i=1}^N$ useful for checking the factual errors and then gets the answers (or evidences) $\{A_i\}_{i=1}^N$ from external sources. Second, Re-Ex lets LLMs explain the factual errors in $R_{\text{initial}}$ based on the evidences, thus getting the explanations $\{E_i\}_{i=1}^M$. Finally, LLM revises its response based on the explanations, outputting the revised response $R_{\text{revised}}$.