Table of Contents
Fetching ...

Transparentize the Internal and External Knowledge Utilization in LLMs with Trustworthy Citation

Jiajun Shen, Tong Zhou, Yubo Chen, Delai Qiu, Shengping Liu, Kang Liu, Jun Zhao

TL;DR

The work tackles the opacity of internal parameter knowledge use in LLMs and the trustworthiness of generated citations by introducing Context-Prior Augmented Citation Generation (CPACG). It proposes the Rational Attribution and Elaboration (RAEL) paradigm and Interpretable Trustworthiness Alignment (Intralign) to jointly optimize reliance on external sources and model parameters, with a five-metric evaluation framework spanning answer usefulness, citation faithfulness, and reference trustworthiness. Through multi-dataset experimental studies and ablations across various models, the approach demonstrates improved cross-scenario performance, higher internal and external citation fidelity, and better calibration, while also revealing how external source quality and question difficulty shape knowledge use. The results underscore the importance of adaptive knowledge integration and meticulous alignment when aiming for trustworthy, verifiable AI-generated content. These findings have practical implications for building more transparent and reliable RAG-style systems with accountable references and calibrated confidence reporting.

Abstract

While hallucinations of large language models could been alleviated through retrieval-augmented generation and citation generation, how the model utilizes internal knowledge is still opaque, and the trustworthiness of its generated answers remains questionable. In this work, we introduce Context-Prior Augmented Citation Generation task, requiring models to generate citations considering both external and internal knowledge while providing trustworthy references, with 5 evaluation metrics focusing on 3 aspects: answer helpfulness, citation faithfulness, and trustworthiness. We introduce RAEL, the paradigm for our task, and also design INTRALIGN, an integrated method containing customary data generation and an alignment algorithm. Our experimental results show that our method achieves a better cross-scenario performance with regard to other baselines. Our extended experiments further reveal that retrieval quality, question types, and model knowledge have considerable influence on the trustworthiness in citation generation.

Transparentize the Internal and External Knowledge Utilization in LLMs with Trustworthy Citation

TL;DR

The work tackles the opacity of internal parameter knowledge use in LLMs and the trustworthiness of generated citations by introducing Context-Prior Augmented Citation Generation (CPACG). It proposes the Rational Attribution and Elaboration (RAEL) paradigm and Interpretable Trustworthiness Alignment (Intralign) to jointly optimize reliance on external sources and model parameters, with a five-metric evaluation framework spanning answer usefulness, citation faithfulness, and reference trustworthiness. Through multi-dataset experimental studies and ablations across various models, the approach demonstrates improved cross-scenario performance, higher internal and external citation fidelity, and better calibration, while also revealing how external source quality and question difficulty shape knowledge use. The results underscore the importance of adaptive knowledge integration and meticulous alignment when aiming for trustworthy, verifiable AI-generated content. These findings have practical implications for building more transparent and reliable RAG-style systems with accountable references and calibrated confidence reporting.

Abstract

While hallucinations of large language models could been alleviated through retrieval-augmented generation and citation generation, how the model utilizes internal knowledge is still opaque, and the trustworthiness of its generated answers remains questionable. In this work, we introduce Context-Prior Augmented Citation Generation task, requiring models to generate citations considering both external and internal knowledge while providing trustworthy references, with 5 evaluation metrics focusing on 3 aspects: answer helpfulness, citation faithfulness, and trustworthiness. We introduce RAEL, the paradigm for our task, and also design INTRALIGN, an integrated method containing customary data generation and an alignment algorithm. Our experimental results show that our method achieves a better cross-scenario performance with regard to other baselines. Our extended experiments further reveal that retrieval quality, question types, and model knowledge have considerable influence on the trustworthiness in citation generation.

Paper Structure

This paper contains 32 sections, 2 equations, 14 figures, 8 tables.

Figures (14)

  • Figure 1: Compared with Context-Agree Citation Generation, the Context-Prior Augmented Citation Generation allows LLMs to appropriately utilize and cite parameter knowledge in an interpretable way, and requires LLMs to extract convincing and concise external references, aiming at transparentize the internal and external knowledge utilization as well as enhancing trustworthiness.
  • Figure 2: Illustration of our metrics and the auto evaluation process. We use the same NLI model to check entailment to prevent bias.
  • Figure 3: Example of different Convincingness and Conciseness Scores
  • Figure 4: Overview of Intralign. We first conduct multi-scenario trustworthy data sampling to incorporate parameter knowledge and generate a golden response following our Rael paradigm. The verified high-quality data will be used for subsequent Interpretability-Focused Alignment, ultimately resulting in a model capable of utilizing parameter knowledge and generating trustworthy citations.
  • Figure 5: Results on Wikipedia and Reddit dataset. We rescaled each metric to a 0%-100% range.
  • ...and 9 more figures