Table of Contents
Fetching ...

KSOD: Knowledge Supplement for LLMs On Demand

Haoran Li, Junfeng Hu

TL;DR

This paper presents KSOD, a Knowledge Supplement for LLMs On Demand, tackling errors caused by missing domain knowledge. KSOD decouples knowledge from tasks and introduces a three-stage process—Knowledge Identification, Knowledge Verification, and Knowledge Supplement—where a LoRA-based knowledge module is trained on datasets containing the identified knowledge, verified via embedding-distribution clustering, and injected into the LLM when confirmed. Experiments across two domain-specific and four general benchmarks show that supplementing the LLM with verified knowledge reduces errors on knowledge-requiring tasks while preserving, or modestly improving, performance on other tasks. The results underscore the potential of knowledge-based SFT to enhance LLM capabilities on demand without compromising broad task performance, offering a practical avenue for targeted knowledge augmentation.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet still produce errors in domain-specific tasks. To further improve their performance, we propose KSOD (Knowledge Supplement for LLMs On Demand), a novel framework that empowers LLMs to improve their capabilities with knowledge-based supervised fine-tuning (SFT). KSOD analyzes the causes of errors from the perspective of knowledge deficiency by identifying potential missing knowledge in LLM that may lead to the errors. Subsequently, KSOD tunes a knowledge module on knowledge dataset and verifies whether the LLM lacks the identified knowledge based on it. If the knowledge is verified, KSOD supplements the LLM with the identified knowledge using the knowledge module. Tuning LLMs on specific knowledge instead of specific task decouples task and knowledge and our experiments on two domain-specific benchmarks and four general benchmarks empirically demonstrate that KSOD enhances the performance of LLMs on tasks requiring the supplemented knowledge while preserving their performance on other tasks. Our findings shed light on the potential of improving the capabilities of LLMs with knowledge-based SFT.

KSOD: Knowledge Supplement for LLMs On Demand

TL;DR

This paper presents KSOD, a Knowledge Supplement for LLMs On Demand, tackling errors caused by missing domain knowledge. KSOD decouples knowledge from tasks and introduces a three-stage process—Knowledge Identification, Knowledge Verification, and Knowledge Supplement—where a LoRA-based knowledge module is trained on datasets containing the identified knowledge, verified via embedding-distribution clustering, and injected into the LLM when confirmed. Experiments across two domain-specific and four general benchmarks show that supplementing the LLM with verified knowledge reduces errors on knowledge-requiring tasks while preserving, or modestly improving, performance on other tasks. The results underscore the potential of knowledge-based SFT to enhance LLM capabilities on demand without compromising broad task performance, offering a practical avenue for targeted knowledge augmentation.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet still produce errors in domain-specific tasks. To further improve their performance, we propose KSOD (Knowledge Supplement for LLMs On Demand), a novel framework that empowers LLMs to improve their capabilities with knowledge-based supervised fine-tuning (SFT). KSOD analyzes the causes of errors from the perspective of knowledge deficiency by identifying potential missing knowledge in LLM that may lead to the errors. Subsequently, KSOD tunes a knowledge module on knowledge dataset and verifies whether the LLM lacks the identified knowledge based on it. If the knowledge is verified, KSOD supplements the LLM with the identified knowledge using the knowledge module. Tuning LLMs on specific knowledge instead of specific task decouples task and knowledge and our experiments on two domain-specific benchmarks and four general benchmarks empirically demonstrate that KSOD enhances the performance of LLMs on tasks requiring the supplemented knowledge while preserving their performance on other tasks. Our findings shed light on the potential of improving the capabilities of LLMs with knowledge-based SFT.

Paper Structure

This paper contains 32 sections, 1 equation, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: On the left side of the figure, samples from Task 1 is presented in the form of (input, output with errors, correct reference). Based on these samples, KSOD identifies the missing knowledge as discourse relations. After verify that LLM lacks this knowledge, it is supplemented into the LLM. As shown on the right side of the figure, after supplementation, the model generates correct outputs not only for Task 1 but also for another task (Task 2) that requires the discourse relation knowledge.
  • Figure 2: Our KSOD framework consists of three stages: (a) Knowledge Identification; (b) Knowledge Verification; (c) Knowledge Supplement.
  • Figure 3: T-SNE van2008visualizing visualization of the embedding distribution and each color represents a category within categorical knowledge based on dataset labels. The embedding is the last token embedding from B matrix of LoRA on test set.