ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base

Siyu Yuan; Jiangjie Chen; Changzhi Sun; Jiaqing Liang; Yanghua Xiao; Deqing Yang

ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base

Siyu Yuan, Jiangjie Chen, Changzhi Sun, Jiaqing Liang, Yanghua Xiao, Deqing Yang

TL;DR

This work introduces AnalogyKB, a million-scale analogy knowledge base built by extracting analogies from seed knowledge graphs (ConceptNet and Wikidata). It formalizes two analogy types—analogies of the same relation and analogies of analogous relations—and uses large language models to discover the latter with two automatic filtering rules and manual curation. AnalogyKB contains over 1,032,040 concept pairs across 943 relations, including 103 analogous-relations, enabling improved analogy recognition and generation for both small LMs and LLMs, often approaching or rivaling human performance on several benchmarks. The resource demonstrates strong value for cross-domain reasoning and offers a scalable, semi-automatic pipeline combining LLM-assisted discovery with quality control to bootstrap high-quality analogical data.

Abstract

Analogical reasoning is a fundamental cognitive ability of humans. However, current language models (LMs) still struggle to achieve human-like performance in analogical reasoning tasks due to a lack of resources for model training. In this work, we address this gap by proposing ANALOGYKB, a million-scale analogy knowledge base (KB) derived from existing knowledge graphs (KGs). ANALOGYKB identifies two types of analogies from the KGs: 1) analogies of the same relations, which can be directly extracted from the KGs, and 2) analogies of analogous relations, which are identified with a selection and filtering pipeline enabled by large language models (LLMs), followed by minor human efforts for data quality control. Evaluations on a series of datasets of two analogical reasoning tasks (analogy recognition and generation) demonstrate that ANALOGYKB successfully enables both smaller LMs and LLMs to gain better analogical reasoning capabilities.

ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base

TL;DR

Abstract

Paper Structure (19 sections, 5 figures, 5 tables)

This paper contains 19 sections, 5 figures, 5 tables.

Introduction
Related Work
Analogy Acquisition
Analogical Reasoning
Knowledge Base Construction
AnalogyKB Construction
Schema for Analogies in AnalogyKB
Source Data Collection
Acquiring Analogies of the Same Relation
Acquiring Analogies of Analogous Relations
Finding Candidate Relation Pairs
Predicting Analogous Relation Pairs
Filtering for High-quality Relation Pairs
Analysis of AnalogyKB
Are the filtering techniques for analogous relations useful?
...and 4 more sections

Figures (5)

Figure 1: An example of acquiring analogies from KGs. Based on the relational knowledge triples from KGs, i.e., facts about the solar system and an atom structure, we can discover new analogies using the corresponding relations between concepts.
Figure 2: The relations with concept pairs are stored in AnalogyKB. We define two types of analogies, i.e., analogies of the same relation and analogies of analogous relations, and derive them from existing KGs.
Figure 3: Distribution of concept categories in our AnalogyKB.
Figure 4: The accuracy of RoBERTa-Large trained on different data subsets on the analogy recognition task. Data denotes the dataset sampled directly from AnalogyKB, Data$_\texttt{same}$ denotes the dataset that only has same-relation analogies, and Data$_\texttt{pseudo}$ denotes the dataset with concept pairs that do not form analogies. All the datasets have the same size.
Figure :

ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base

TL;DR

Abstract

ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base

Authors

TL;DR

Abstract

Table of Contents

Figures (5)