KSOD: Knowledge Supplement for LLMs On Demand
Haoran Li, Junfeng Hu
TL;DR
This paper presents KSOD, a Knowledge Supplement for LLMs On Demand, tackling errors caused by missing domain knowledge. KSOD decouples knowledge from tasks and introduces a three-stage process—Knowledge Identification, Knowledge Verification, and Knowledge Supplement—where a LoRA-based knowledge module is trained on datasets containing the identified knowledge, verified via embedding-distribution clustering, and injected into the LLM when confirmed. Experiments across two domain-specific and four general benchmarks show that supplementing the LLM with verified knowledge reduces errors on knowledge-requiring tasks while preserving, or modestly improving, performance on other tasks. The results underscore the potential of knowledge-based SFT to enhance LLM capabilities on demand without compromising broad task performance, offering a practical avenue for targeted knowledge augmentation.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet still produce errors in domain-specific tasks. To further improve their performance, we propose KSOD (Knowledge Supplement for LLMs On Demand), a novel framework that empowers LLMs to improve their capabilities with knowledge-based supervised fine-tuning (SFT). KSOD analyzes the causes of errors from the perspective of knowledge deficiency by identifying potential missing knowledge in LLM that may lead to the errors. Subsequently, KSOD tunes a knowledge module on knowledge dataset and verifies whether the LLM lacks the identified knowledge based on it. If the knowledge is verified, KSOD supplements the LLM with the identified knowledge using the knowledge module. Tuning LLMs on specific knowledge instead of specific task decouples task and knowledge and our experiments on two domain-specific benchmarks and four general benchmarks empirically demonstrate that KSOD enhances the performance of LLMs on tasks requiring the supplemented knowledge while preserving their performance on other tasks. Our findings shed light on the potential of improving the capabilities of LLMs with knowledge-based SFT.
