Enriching Taxonomies Using Large Language Models

Zeinab Ghamlouch; Mehwish Alam

Enriching Taxonomies Using Large Language Models

Zeinab Ghamlouch, Mehwish Alam

TL;DR

Taxoria is presented, a novel taxonomy enrichment pipeline that leverages Large Language Models to enhance a given taxonomy and uses an existing taxonomy as a seed and prompts an LLM to propose candidate nodes for enrichment.

Abstract

Taxonomies play a vital role in structuring and categorizing information across domains. However, many existing taxonomies suffer from limited coverage and outdated or ambiguous nodes, reducing their effectiveness in knowledge retrieval. To address this, we present Taxoria, a novel taxonomy enrichment pipeline that leverages Large Language Models (LLMs) to enhance a given taxonomy. Unlike approaches that extract internal LLM taxonomies, Taxoria uses an existing taxonomy as a seed and prompts an LLM to propose candidate nodes for enrichment. These candidates are then validated to mitigate hallucinations and ensure semantic relevance before integration. The final output includes an enriched taxonomy with provenance tracking and visualization of the final merged taxonomy for analysis.

Enriching Taxonomies Using Large Language Models

TL;DR

Abstract

Paper Structure (16 sections, 2 figures, 1 table)

This paper contains 16 sections, 2 figures, 1 table.

Related Work
Overall Architecture
Taxonomy Traversal Strategy.
Prompting LLM.
Mitigating Over-generations LLM.
Merging process
Node Generation.
Validation of Node Relevance.
Merging Nodes.
Adding Class Provenance.
Implementation Details
Use-case and Impact
Real-World Taxonomy Enrichment.
Analysis.
Impact.
...and 1 more sections

Figures (2)

Figure 1: Overall architecture of Taxoria. The blue node represents the current node in the seed taxonomy for which the direct children are generated and the green node represents the newly added node after the merging process.
Figure 2: This figure illustrates an example of merging two hierarchical taxonomies in the technology domain. Nodes common to both input taxonomies are shown in green, while nodes unique to each taxonomy are marked in blue and red, respectively.

Enriching Taxonomies Using Large Language Models

TL;DR

Abstract

Enriching Taxonomies Using Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (2)