SUBMASSIVE: Resolving Subclass Cycles in Very Large Knowledge Graphs
Shuai Wang, Peter Bloem, Joe Raad, Frank van Harmelen
TL;DR
This work tackles erroneous cyclic subclass relations in very large knowledge graphs, which hinder reliable subclass hierarchies and transitive closures. It introduces SUBMASSIVE, a hybrid approach that combines data pre-processing with an iterative, local cycle-resolving procedure implemented via a MAXSAT formulation to remove a minimal set of rdfs:subClassOf edges. Evaluated on the LOD-a-lot dataset, the method demonstrates scalability and reveals a trade-off between the number of removed relations and computation time, achieving substantial cycle reduction with manageable runtime under controlled bounds. The resulting cycle-free hierarchy enables accurate transitive reasoning and can support downstream machine learning tasks on large-scale knowledge graphs. This work advances KG refinement by providing an anytime, scalable cycle-resolving method and releasing processed data for reproducibility.
Abstract
Large knowledge graphs capture information of a large number of entities and their relations. Among the many relations they capture, class subsumption assertions are usually present and expressed using the \texttt{rdfs:subClassOf} construct. From our examination, publicly available knowledge graphs contain many potentially erroneous cyclic subclass relations, a problem that can be exacerbated when different knowledge graphs are integrated as Linked Open Data. In this paper, we present an automatic approach for resolving such cycles at scale using automated reasoning by encoding the problem of cycle-resolving to a MAXSAT solver. The approach is tested on the LOD-a-lot dataset, and compared against a semi-automatic version of our algorithm. We show how the number of removed triples is a trade-off against the efficiency of the algorithm.
