Transparentize the Internal and External Knowledge Utilization in LLMs with Trustworthy Citation
Jiajun Shen, Tong Zhou, Yubo Chen, Delai Qiu, Shengping Liu, Kang Liu, Jun Zhao
TL;DR
The work tackles the opacity of internal parameter knowledge use in LLMs and the trustworthiness of generated citations by introducing Context-Prior Augmented Citation Generation (CPACG). It proposes the Rational Attribution and Elaboration (RAEL) paradigm and Interpretable Trustworthiness Alignment (Intralign) to jointly optimize reliance on external sources and model parameters, with a five-metric evaluation framework spanning answer usefulness, citation faithfulness, and reference trustworthiness. Through multi-dataset experimental studies and ablations across various models, the approach demonstrates improved cross-scenario performance, higher internal and external citation fidelity, and better calibration, while also revealing how external source quality and question difficulty shape knowledge use. The results underscore the importance of adaptive knowledge integration and meticulous alignment when aiming for trustworthy, verifiable AI-generated content. These findings have practical implications for building more transparent and reliable RAG-style systems with accountable references and calibrated confidence reporting.
Abstract
While hallucinations of large language models could been alleviated through retrieval-augmented generation and citation generation, how the model utilizes internal knowledge is still opaque, and the trustworthiness of its generated answers remains questionable. In this work, we introduce Context-Prior Augmented Citation Generation task, requiring models to generate citations considering both external and internal knowledge while providing trustworthy references, with 5 evaluation metrics focusing on 3 aspects: answer helpfulness, citation faithfulness, and trustworthiness. We introduce RAEL, the paradigm for our task, and also design INTRALIGN, an integrated method containing customary data generation and an alignment algorithm. Our experimental results show that our method achieves a better cross-scenario performance with regard to other baselines. Our extended experiments further reveal that retrieval quality, question types, and model knowledge have considerable influence on the trustworthiness in citation generation.
