Table of Contents
Fetching ...

Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs

Yifei Zhang, Xintao Wang, Jiaqing Liang, Sirui Xia, Lida Chen, Yanghua Xiao

TL;DR

This work presents Chain-of-Knowledge (CoK), a framework that integrates knowledge reasoning into large language models by learning from knowledge graphs. It introduces KnowReason, a KG-derived dataset built via rule mining, knowledge selection, and sample generation, and pairs it with a supervised learning approach that includes naive training and a trial-and-error mechanism to mitigate rule overfitting. Empirical results show CoK improves knowledge reasoning and general reasoning capabilities on anonymized data, with the Trial-and-Error variant enhancing robustness to out-of-distribution scenarios and reducing rule dependency. The approach generalizes to downstream tasks and commonsense reasoning benchmarks, suggesting a practical pathway to more reliable, KG-informed reasoning in LLMs. The work also highlights considerations around data leakage in regular settings and proposes a model-specific evaluation regime for real-world deployment.

Abstract

Large Language Models (LLMs) have exhibited impressive proficiency in various natural language processing (NLP) tasks, which involve increasingly complex reasoning. Knowledge reasoning, a primary type of reasoning, aims at deriving new knowledge from existing one.While it has been widely studied in the context of knowledge graphs (KGs), knowledge reasoning in LLMs remains underexplored. In this paper, we introduce Chain-of-Knowledge, a comprehensive framework for knowledge reasoning, including methodologies for both dataset construction and model learning. For dataset construction, we create KnowReason via rule mining on KGs. For model learning, we observe rule overfitting induced by naive training. Hence, we enhance CoK with a trial-and-error mechanism that simulates the human process of internal knowledge exploration. We conduct extensive experiments with KnowReason. Our results show the effectiveness of CoK in refining LLMs in not only knowledge reasoning, but also general reasoning benchmarkms.

Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs

TL;DR

This work presents Chain-of-Knowledge (CoK), a framework that integrates knowledge reasoning into large language models by learning from knowledge graphs. It introduces KnowReason, a KG-derived dataset built via rule mining, knowledge selection, and sample generation, and pairs it with a supervised learning approach that includes naive training and a trial-and-error mechanism to mitigate rule overfitting. Empirical results show CoK improves knowledge reasoning and general reasoning capabilities on anonymized data, with the Trial-and-Error variant enhancing robustness to out-of-distribution scenarios and reducing rule dependency. The approach generalizes to downstream tasks and commonsense reasoning benchmarks, suggesting a practical pathway to more reliable, KG-informed reasoning in LLMs. The work also highlights considerations around data leakage in regular settings and proposes a model-specific evaluation regime for real-world deployment.

Abstract

Large Language Models (LLMs) have exhibited impressive proficiency in various natural language processing (NLP) tasks, which involve increasingly complex reasoning. Knowledge reasoning, a primary type of reasoning, aims at deriving new knowledge from existing one.While it has been widely studied in the context of knowledge graphs (KGs), knowledge reasoning in LLMs remains underexplored. In this paper, we introduce Chain-of-Knowledge, a comprehensive framework for knowledge reasoning, including methodologies for both dataset construction and model learning. For dataset construction, we create KnowReason via rule mining on KGs. For model learning, we observe rule overfitting induced by naive training. Hence, we enhance CoK with a trial-and-error mechanism that simulates the human process of internal knowledge exploration. We conduct extensive experiments with KnowReason. Our results show the effectiveness of CoK in refining LLMs in not only knowledge reasoning, but also general reasoning benchmarkms.
Paper Structure (54 sections, 2 equations, 2 figures, 13 tables, 2 algorithms)

This paper contains 54 sections, 2 equations, 2 figures, 13 tables, 2 algorithms.

Figures (2)

  • Figure 1: Current LLMs struggle with knowledge reasoning, i.e., combining acquired knowledge to infer new knowledge.
  • Figure 2: The framework of Chain-of-Knowledge (CoK). (a) Dataset Construction includes three steps, i.e., rule mining, knowledge selection and sample generation. This yields a knowledge dataset and CoK dataset. (b) Vanilla CoK trains LLMs in a behavior cloning manner, which may induce rule overfitting and hallucination. (c) CoK (Trial and Error) is henced proposed, which enables LLMs to simulate humans' internal process of knowledge exploration.