Table of Contents
Fetching ...

Insert or Attach: Taxonomy Completion via Box Embedding

Wei Xue, Yongliang Shen, Wenqi Ren, Jietian Guo, Shiliang Pu, Weiming Lu

TL;DR

TaxBox tackles taxonomy completion by enabling insertion and attachment of new concepts within a box-embedding space. It integrates a structurally enhanced box decoder, two specialized scorers (insertion and attachment), and three learning objectives (box-constrained loss, dynamic ranking loss, and a classification-type objective) to capture fine-grained hierarchical relations without relying on pseudo-leaves. The approach demonstrates strong, consistent improvements across six real-world datasets over strong baselines, validating the advantages of box geometry for asymmetric is-a relations and multiple-parent scenarios. This has practical impact for knowledge graphs and downstream tasks requiring robust, nuanced taxonomy integration without leaf-imputation biases.

Abstract

Taxonomy completion, enriching existing taxonomies by inserting new concepts as parents or attaching them as children, has gained significant interest. Previous approaches embed concepts as vectors in Euclidean space, which makes it difficult to model asymmetric relations in taxonomy. In addition, they introduce pseudo-leaves to convert attachment cases into insertion cases, leading to an incorrect bias in network learning dominated by numerous pseudo-leaves. Addressing these, our framework, TaxBox, leverages box containment and center closeness to design two specialized geometric scorers within the box embedding space. These scorers are tailored for insertion and attachment operations and can effectively capture intrinsic relationships between concepts by optimizing on a granular box constraint loss. We employ a dynamic ranking loss mechanism to balance the scores from these scorers, allowing adaptive adjustments of insertion and attachment scores. Experiments on four real-world datasets show that TaxBox significantly outperforms previous methods, yielding substantial improvements over prior methods in real-world datasets, with average performance boosts of 6.7%, 34.9%, and 51.4% in MRR, Hit@1, and Prec@1, respectively.

Insert or Attach: Taxonomy Completion via Box Embedding

TL;DR

TaxBox tackles taxonomy completion by enabling insertion and attachment of new concepts within a box-embedding space. It integrates a structurally enhanced box decoder, two specialized scorers (insertion and attachment), and three learning objectives (box-constrained loss, dynamic ranking loss, and a classification-type objective) to capture fine-grained hierarchical relations without relying on pseudo-leaves. The approach demonstrates strong, consistent improvements across six real-world datasets over strong baselines, validating the advantages of box geometry for asymmetric is-a relations and multiple-parent scenarios. This has practical impact for knowledge graphs and downstream tasks requiring robust, nuanced taxonomy integration without leaf-imputation biases.

Abstract

Taxonomy completion, enriching existing taxonomies by inserting new concepts as parents or attaching them as children, has gained significant interest. Previous approaches embed concepts as vectors in Euclidean space, which makes it difficult to model asymmetric relations in taxonomy. In addition, they introduce pseudo-leaves to convert attachment cases into insertion cases, leading to an incorrect bias in network learning dominated by numerous pseudo-leaves. Addressing these, our framework, TaxBox, leverages box containment and center closeness to design two specialized geometric scorers within the box embedding space. These scorers are tailored for insertion and attachment operations and can effectively capture intrinsic relationships between concepts by optimizing on a granular box constraint loss. We employ a dynamic ranking loss mechanism to balance the scores from these scorers, allowing adaptive adjustments of insertion and attachment scores. Experiments on four real-world datasets show that TaxBox significantly outperforms previous methods, yielding substantial improvements over prior methods in real-world datasets, with average performance boosts of 6.7%, 34.9%, and 51.4% in MRR, Hit@1, and Prec@1, respectively.
Paper Structure (21 sections, 15 equations, 4 figures, 6 tables)

This paper contains 21 sections, 15 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Example of taxonomy completion with our TaxBox framework.
  • Figure 2: Overview of TaxBox architecture. (a) The seed taxonomy tree with a query concept. (b) A structurally enhanced box decoder maps concepts among all the candidates and the query concept to the box embedding space. (c) Two probabilistic scorers calculate confidence of insertion or attachment for each candidate position. (d) Find the best position via ranking to complete the seed taxonomy with the novel concept in box embedding space.
  • Figure 3: Details of Graph aggregation module.
  • Figure 4: The effect of box dimensionality and the number of negative samples over three datasets.