Thrust: Adaptively Propels Large Language Models with External Knowledge
Xinran Zhao, Hongming Zhang, Xiaoman Pan, Wenlin Yao, Dong Yu, Jianshu Chen
TL;DR
This work tackles the cost and noise of retrieving external knowledge for large language models by introducing Instance-level Adaptive Propulsion of External Knowledge (IAPEK). It hinges on Thrust, an instance-level knowledgeability score computed from compact hidden representations and cluster centroids, to decide whether external retrieval is necessary, formally via a threshold on $s(q)$. Across seven MC tasks and five open-domain QA tasks, Thrust correlates with knowledge needs and enables cost-efficient augmentation, achieving up to a 26% average performance improvement on 88% of tasks under budgeted retrieval, while sometimes surpassing full-knowledge usage for a subset of tasks. The results demonstrate that selective, threshold-driven retrieval can reduce noise and latency without sacrificing, and often improving, performance, offering practical guidance for deploying knowledge-enhanced LMs in resource-constrained settings.
Abstract
Although large-scale pre-trained language models (PTLMs) are shown to encode rich knowledge in their model parameters, the inherent knowledge in PTLMs can be opaque or static, making external knowledge necessary. However, the existing information retrieval techniques could be costly and may even introduce noisy and sometimes misleading knowledge. To address these challenges, we propose the instance-level adaptive propulsion of external knowledge (IAPEK), where we only conduct the retrieval when necessary. To achieve this goal, we propose measuring whether a PTLM contains enough knowledge to solve an instance with a novel metric, Thrust, which leverages the representation distribution of a small number of seen instances. Extensive experiments demonstrate that thrust is a good measurement of PTLM models' instance-level knowledgeability. Moreover, we can achieve significantly higher cost-efficiency with the Thrust score as the retrieval indicator than the naive usage of external knowledge on 88% of the evaluated tasks with 26% average performance improvement. Such findings shed light on the real-world practice of knowledge-enhanced LMs with a limited knowledge-seeking budget due to computation latency or costs.
