Table of Contents
Fetching ...

Semi-Structured Chain-of-Thought: Integrating Multiple Sources of Knowledge for Improved Language Model Reasoning

Xin Su, Tiep Le, Steven Bethard, Phillip Howard

TL;DR

This work tackles knowledge-intensive reasoning by integrating three sources of knowledge—parametric memory, external structured knowledge graphs, and unstructured text—through semi-structured reasoning chains. The method first parses questions into masked triplets and then fills the masks by grounding to a KG, querying unstructured text, and finally leveraging the LLM's memory, all without model fine-tuning. Empirical results across 2WikiMultihopQA, MuSiQue-Ans, and Bamboogle show state-of-the-art performance and strong gains over inference-only baselines and even some fine-tuned approaches, with analyses detailing the roles of each knowledge source and model size. The approach offers a scalable, inference-time solution to fuse diverse knowledge sources, reducing hallucination and increasing factual accuracy for complex reasoning tasks, while acknowledging limitations and directions for future refinement.

Abstract

An important open question in the use of large language models for knowledge-intensive tasks is how to effectively integrate knowledge from three sources: the model's parametric memory, external structured knowledge, and external unstructured knowledge. Most existing prompting methods either rely on one or two of these sources, or require repeatedly invoking large language models to generate similar or identical content. In this work, we overcome these limitations by introducing a novel semi-structured prompting approach that seamlessly integrates the model's parametric memory with unstructured knowledge from text documents and structured knowledge from knowledge graphs. Experimental results on open-domain multi-hop question answering datasets demonstrate that our prompting method significantly surpasses existing techniques, even exceeding those that require fine-tuning.

Semi-Structured Chain-of-Thought: Integrating Multiple Sources of Knowledge for Improved Language Model Reasoning

TL;DR

This work tackles knowledge-intensive reasoning by integrating three sources of knowledge—parametric memory, external structured knowledge graphs, and unstructured text—through semi-structured reasoning chains. The method first parses questions into masked triplets and then fills the masks by grounding to a KG, querying unstructured text, and finally leveraging the LLM's memory, all without model fine-tuning. Empirical results across 2WikiMultihopQA, MuSiQue-Ans, and Bamboogle show state-of-the-art performance and strong gains over inference-only baselines and even some fine-tuned approaches, with analyses detailing the roles of each knowledge source and model size. The approach offers a scalable, inference-time solution to fuse diverse knowledge sources, reducing hallucination and increasing factual accuracy for complex reasoning tasks, while acknowledging limitations and directions for future refinement.

Abstract

An important open question in the use of large language models for knowledge-intensive tasks is how to effectively integrate knowledge from three sources: the model's parametric memory, external structured knowledge, and external unstructured knowledge. Most existing prompting methods either rely on one or two of these sources, or require repeatedly invoking large language models to generate similar or identical content. In this work, we overcome these limitations by introducing a novel semi-structured prompting approach that seamlessly integrates the model's parametric memory with unstructured knowledge from text documents and structured knowledge from knowledge graphs. Experimental results on open-domain multi-hop question answering datasets demonstrate that our prompting method significantly surpasses existing techniques, even exceeding those that require fine-tuning.
Paper Structure (53 sections, 2 figures, 10 tables, 2 algorithms)

This paper contains 53 sections, 2 figures, 10 tables, 2 algorithms.

Figures (2)

  • Figure 1: Overview of our approach using different sources of knowledge.
  • Figure 2: Comparison of structured and unstructured knowledge.