Table of Contents
Fetching ...

ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base

Siyu Yuan, Jiangjie Chen, Changzhi Sun, Jiaqing Liang, Yanghua Xiao, Deqing Yang

TL;DR

This work introduces AnalogyKB, a million-scale analogy knowledge base built by extracting analogies from seed knowledge graphs (ConceptNet and Wikidata). It formalizes two analogy types—analogies of the same relation and analogies of analogous relations—and uses large language models to discover the latter with two automatic filtering rules and manual curation. AnalogyKB contains over 1,032,040 concept pairs across 943 relations, including 103 analogous-relations, enabling improved analogy recognition and generation for both small LMs and LLMs, often approaching or rivaling human performance on several benchmarks. The resource demonstrates strong value for cross-domain reasoning and offers a scalable, semi-automatic pipeline combining LLM-assisted discovery with quality control to bootstrap high-quality analogical data.

Abstract

Analogical reasoning is a fundamental cognitive ability of humans. However, current language models (LMs) still struggle to achieve human-like performance in analogical reasoning tasks due to a lack of resources for model training. In this work, we address this gap by proposing ANALOGYKB, a million-scale analogy knowledge base (KB) derived from existing knowledge graphs (KGs). ANALOGYKB identifies two types of analogies from the KGs: 1) analogies of the same relations, which can be directly extracted from the KGs, and 2) analogies of analogous relations, which are identified with a selection and filtering pipeline enabled by large language models (LLMs), followed by minor human efforts for data quality control. Evaluations on a series of datasets of two analogical reasoning tasks (analogy recognition and generation) demonstrate that ANALOGYKB successfully enables both smaller LMs and LLMs to gain better analogical reasoning capabilities.

ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base

TL;DR

This work introduces AnalogyKB, a million-scale analogy knowledge base built by extracting analogies from seed knowledge graphs (ConceptNet and Wikidata). It formalizes two analogy types—analogies of the same relation and analogies of analogous relations—and uses large language models to discover the latter with two automatic filtering rules and manual curation. AnalogyKB contains over 1,032,040 concept pairs across 943 relations, including 103 analogous-relations, enabling improved analogy recognition and generation for both small LMs and LLMs, often approaching or rivaling human performance on several benchmarks. The resource demonstrates strong value for cross-domain reasoning and offers a scalable, semi-automatic pipeline combining LLM-assisted discovery with quality control to bootstrap high-quality analogical data.

Abstract

Analogical reasoning is a fundamental cognitive ability of humans. However, current language models (LMs) still struggle to achieve human-like performance in analogical reasoning tasks due to a lack of resources for model training. In this work, we address this gap by proposing ANALOGYKB, a million-scale analogy knowledge base (KB) derived from existing knowledge graphs (KGs). ANALOGYKB identifies two types of analogies from the KGs: 1) analogies of the same relations, which can be directly extracted from the KGs, and 2) analogies of analogous relations, which are identified with a selection and filtering pipeline enabled by large language models (LLMs), followed by minor human efforts for data quality control. Evaluations on a series of datasets of two analogical reasoning tasks (analogy recognition and generation) demonstrate that ANALOGYKB successfully enables both smaller LMs and LLMs to gain better analogical reasoning capabilities.
Paper Structure (19 sections, 5 figures, 5 tables)

This paper contains 19 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: An example of acquiring analogies from KGs. Based on the relational knowledge triples from KGs, i.e., facts about the solar system and an atom structure, we can discover new analogies using the corresponding relations between concepts.
  • Figure 2: The relations with concept pairs are stored in AnalogyKB. We define two types of analogies, i.e., analogies of the same relation and analogies of analogous relations, and derive them from existing KGs.
  • Figure 3: Distribution of concept categories in our AnalogyKB.
  • Figure 4: The accuracy of RoBERTa-Large trained on different data subsets on the analogy recognition task. Data denotes the dataset sampled directly from AnalogyKB, Data$_\texttt{same}$ denotes the dataset that only has same-relation analogies, and Data$_\texttt{pseudo}$ denotes the dataset with concept pairs that do not form analogies. All the datasets have the same size.
  • Figure :