ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base
Siyu Yuan, Jiangjie Chen, Changzhi Sun, Jiaqing Liang, Yanghua Xiao, Deqing Yang
TL;DR
This work introduces AnalogyKB, a million-scale analogy knowledge base built by extracting analogies from seed knowledge graphs (ConceptNet and Wikidata). It formalizes two analogy types—analogies of the same relation and analogies of analogous relations—and uses large language models to discover the latter with two automatic filtering rules and manual curation. AnalogyKB contains over 1,032,040 concept pairs across 943 relations, including 103 analogous-relations, enabling improved analogy recognition and generation for both small LMs and LLMs, often approaching or rivaling human performance on several benchmarks. The resource demonstrates strong value for cross-domain reasoning and offers a scalable, semi-automatic pipeline combining LLM-assisted discovery with quality control to bootstrap high-quality analogical data.
Abstract
Analogical reasoning is a fundamental cognitive ability of humans. However, current language models (LMs) still struggle to achieve human-like performance in analogical reasoning tasks due to a lack of resources for model training. In this work, we address this gap by proposing ANALOGYKB, a million-scale analogy knowledge base (KB) derived from existing knowledge graphs (KGs). ANALOGYKB identifies two types of analogies from the KGs: 1) analogies of the same relations, which can be directly extracted from the KGs, and 2) analogies of analogous relations, which are identified with a selection and filtering pipeline enabled by large language models (LLMs), followed by minor human efforts for data quality control. Evaluations on a series of datasets of two analogical reasoning tasks (analogy recognition and generation) demonstrate that ANALOGYKB successfully enables both smaller LMs and LLMs to gain better analogical reasoning capabilities.
