Table of Contents
Fetching ...

Can Language Models Act as Knowledge Bases at Scale?

Qiyuan He, Yizhong Wang, Wenya Wang

TL;DR

The findings indicate that while LLMs hold promise as large-scale KBs capable of retrieving and responding with flexibility, enhancements in their reasoning capabilities are necessary to fully realize their potential.

Abstract

Large language models (LLMs) have demonstrated remarkable proficiency in understanding and generating responses to complex queries through large-scale pre-training. However, the efficacy of these models in memorizing and reasoning among large-scale structured knowledge, especially world knowledge that explicitly covers abundant factual information remains questionable. Addressing this gap, our research investigates whether LLMs can effectively store, recall, and reason with knowledge on a large scale comparable to latest knowledge bases (KBs) such as Wikidata. Specifically, we focus on three crucial aspects to study the viability: (1) the efficiency of LLMs with different sizes in memorizing the exact knowledge in the large-scale KB; (2) the flexibility of recalling the memorized knowledge in response to natural language queries; (3) the capability to infer new knowledge through reasoning. Our findings indicate that while LLMs hold promise as large-scale KBs capable of retrieving and responding with flexibility, enhancements in their reasoning capabilities are necessary to fully realize their potential.

Can Language Models Act as Knowledge Bases at Scale?

TL;DR

The findings indicate that while LLMs hold promise as large-scale KBs capable of retrieving and responding with flexibility, enhancements in their reasoning capabilities are necessary to fully realize their potential.

Abstract

Large language models (LLMs) have demonstrated remarkable proficiency in understanding and generating responses to complex queries through large-scale pre-training. However, the efficacy of these models in memorizing and reasoning among large-scale structured knowledge, especially world knowledge that explicitly covers abundant factual information remains questionable. Addressing this gap, our research investigates whether LLMs can effectively store, recall, and reason with knowledge on a large scale comparable to latest knowledge bases (KBs) such as Wikidata. Specifically, we focus on three crucial aspects to study the viability: (1) the efficiency of LLMs with different sizes in memorizing the exact knowledge in the large-scale KB; (2) the flexibility of recalling the memorized knowledge in response to natural language queries; (3) the capability to infer new knowledge through reasoning. Our findings indicate that while LLMs hold promise as large-scale KBs capable of retrieving and responding with flexibility, enhancements in their reasoning capabilities are necessary to fully realize their potential.
Paper Structure (22 sections, 1 equation, 5 figures, 4 tables, 1 algorithm)

This paper contains 22 sections, 1 equation, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Distribution of entity and relation occurrences in world knowledge $\mathcal{D}_0$.
  • Figure 2: Learning curves of T5-base training on $\mathcal{D}_{1}$ with and without importance sampling (ImSmp), evaluated using $\mathcal{D}_{1-Eval}$.
  • Figure 3: Evaluating the fixed-form information recall ability for LMs training on $\mathcal{D}_0$. T5 models are on the upper row, and LLaMA-2 models are on the bottom row.
  • Figure 4: PopQA finetuning performance and knowledge recall on various checkpoints through training on $\mathcal{D}_0$. The pre-trained models are represented by epoch $0$.
  • Figure 5: Evaluating the ability to infer new knowledge across various model checkpoints through training on $\mathcal{D}_0$. The x-axis of the plots indicates the checkpoints having the number of epochs when training LMs using $\mathcal{D}_0$. Specifically, epoch $0$ stands for the pre-trained checkpoints.