Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched Embeddings
Albert Sawczyn, Jakub Binkowski, Piotr Bielak, Tomasz Kajdanowicz
TL;DR
Knowledge-intensive tasks strain ML systems and LLMs often hallucinate; the paper proposes a modular framework to enrich small domain-specific KGs by aligning and linking them to a large general-purpose KG. It computes entity representations from labels and neighborhood context, connects DKG entities to $k$ nearest neighbors in the GKG to form a linked KG, and trains KG completion with a weighted loss that accounts for imperfect alignments via $ w_s = 1 / (1 + \mathrm{distance}(x(e_i), x(e_j)))$. The approach yields up to $44.9\%$ Hits@10 improvement in synthetic, data-scarce settings and meaningful gains in real-world scenarios depending on GKG suitability, demonstrating that small KGs can leverage broad knowledge to improve robustness and reduce hallucinations. The framework is modular and reproducible, enabling broader adoption of KGs in knowledge-intensive tasks and offering a practical pathway for enhancing downstream ML systems without excessive KG-building costs.
Abstract
Knowledge-intensive tasks pose a significant challenge for Machine Learning (ML) techniques. Commonly adopted methods, such as Large Language Models (LLMs), often exhibit limitations when applied to such tasks. Nevertheless, there have been notable endeavours to mitigate these challenges, with a significant emphasis on augmenting LLMs through Knowledge Graphs (KGs). While KGs provide many advantages for representing knowledge, their development costs can deter extensive research and applications. Addressing this limitation, we introduce a framework for enriching embeddings of small-scale domain-specific Knowledge Graphs with well-established general-purpose KGs. Adopting our method, a modest domain-specific KG can benefit from a performance boost in downstream tasks when linked to a substantial general-purpose KG. Experimental evaluations demonstrate a notable enhancement, with up to a 44% increase observed in the Hits@10 metric. This relatively unexplored research direction can catalyze more frequent incorporation of KGs in knowledge-intensive tasks, resulting in more robust, reliable ML implementations, which hallucinates less than prevalent LLM solutions. Keywords: knowledge graph, knowledge graph completion, entity alignment, representation learning, machine learning
