Table of Contents
Fetching ...

Internal and External Knowledge Interactive Refinement Framework for Knowledge-Intensive Question Answering

Haowei Du, Dongyan Zhao

TL;DR

IEKR presents an internal-external knowledge interactive refinement framework that leverages latent knowledge inside LLMs to improve retrieval from external knowledge bases and uses retrieved evidence to refine the model's internal knowledge. The approach combines internal knowledge reflection, a cross-encoder-based external retriever, and an LLM-based reader to produce accurate answers for knowledge-intensive QA tasks. Across CommonsenseQA, OpenBookQA, and MedQA-USMLE, IEKR achieves state-of-the-art results, with ablations confirming the importance of both internal reflection and external retrieval. The framework also demonstrates strong generalization to open-domain QA and provides insights into retrieval quantity trade-offs, highlighting its practical impact for robust, knowledge-grounded QA systems.

Abstract

Recent works have attempted to integrate external knowledge into LLMs to address the limitations and potential factual errors in LLM-generated content. However, how to retrieve the correct knowledge from the large amount of external knowledge imposes a challenge. To this end, we empirically observe that LLMs have already encoded rich knowledge in their pretrained parameters and utilizing these internal knowledge improves the retrieval of external knowledge when applying them to knowledge-intensive tasks. In this paper, we propose a new internal and external knowledge interactive refinement paradigm dubbed IEKR to utilize internal knowledge in LLM to help retrieve relevant knowledge from the external knowledge base, as well as exploit the external knowledge to refine the hallucination of generated internal knowledge. By simply adding a prompt like 'Tell me something about' to the LLMs, we try to review related explicit knowledge and insert them with the query into the retriever for external retrieval. The external knowledge is utilized to complement the internal knowledge into input of LLM for answers. We conduct experiments on 3 benchmark datasets in knowledge-intensive question answering task with different LLMs and domains, achieving the new state-of-the-art. Further analysis shows the effectiveness of different modules in our approach.

Internal and External Knowledge Interactive Refinement Framework for Knowledge-Intensive Question Answering

TL;DR

IEKR presents an internal-external knowledge interactive refinement framework that leverages latent knowledge inside LLMs to improve retrieval from external knowledge bases and uses retrieved evidence to refine the model's internal knowledge. The approach combines internal knowledge reflection, a cross-encoder-based external retriever, and an LLM-based reader to produce accurate answers for knowledge-intensive QA tasks. Across CommonsenseQA, OpenBookQA, and MedQA-USMLE, IEKR achieves state-of-the-art results, with ablations confirming the importance of both internal reflection and external retrieval. The framework also demonstrates strong generalization to open-domain QA and provides insights into retrieval quantity trade-offs, highlighting its practical impact for robust, knowledge-grounded QA systems.

Abstract

Recent works have attempted to integrate external knowledge into LLMs to address the limitations and potential factual errors in LLM-generated content. However, how to retrieve the correct knowledge from the large amount of external knowledge imposes a challenge. To this end, we empirically observe that LLMs have already encoded rich knowledge in their pretrained parameters and utilizing these internal knowledge improves the retrieval of external knowledge when applying them to knowledge-intensive tasks. In this paper, we propose a new internal and external knowledge interactive refinement paradigm dubbed IEKR to utilize internal knowledge in LLM to help retrieve relevant knowledge from the external knowledge base, as well as exploit the external knowledge to refine the hallucination of generated internal knowledge. By simply adding a prompt like 'Tell me something about' to the LLMs, we try to review related explicit knowledge and insert them with the query into the retriever for external retrieval. The external knowledge is utilized to complement the internal knowledge into input of LLM for answers. We conduct experiments on 3 benchmark datasets in knowledge-intensive question answering task with different LLMs and domains, achieving the new state-of-the-art. Further analysis shows the effectiveness of different modules in our approach.
Paper Structure (21 sections, 4 equations, 2 figures, 12 tables)

This paper contains 21 sections, 4 equations, 2 figures, 12 tables.

Figures (2)

  • Figure 1: One example from OpenbookQA dataset
  • Figure 2: The pipeline of our approach. Our model composes 2 modules: a retriever $\mathcal{R}$ to retrieve the relevant knowledge facts from the external KB; an LLM reader $\mathcal{A}$ to answer the query based on the internal knowledge within the model as well as retrieved external knowledge. Red denotes internal knowledge needed for the question and blue denotes the external knowledge retrieved with internal knowledge.