Table of Contents
Fetching ...

Chain-of-Layer: Iteratively Prompting Large Language Models for Taxonomy Induction from Limited Examples

Qingkai Zeng, Yuyang Bai, Zhaoxuan Tan, Shangbin Feng, Zhenwen Liang, Zhihan Zhang, Meng Jiang

TL;DR

Chain-of-Layer (CoL) introduces a layer-wise, in-context learning framework for taxonomy induction that uses Hierarchical Format Taxonomy Induction Instructions (HF) and an Ensemble-based Ranking Filter to mitigate hallucinations. By splitting the task into top-down layers and validating each step with an ensemble of templates and a masked-language model, CoL achieves state-of-the-art performance on WordNet sub-taxonomies and three large-scale taxonomies, while CoL-Zero demonstrates strong cross-domain adaptability. The approach systematically improves both precision and structural coherence over single-pass prompting methods, and ablation studies highlight the complementary roles of CoL and the filter. Overall, CoL offers a scalable, interpretable paradigm for constructing coherent taxonomies from limited example sets, with practical impact for search, recommendation, and QA systems.

Abstract

Automatic taxonomy induction is crucial for web search, recommendation systems, and question answering. Manual curation of taxonomies is expensive in terms of human effort, making automatic taxonomy construction highly desirable. In this work, we introduce Chain-of-Layer which is an in-context learning framework designed to induct taxonomies from a given set of entities. Chain-of-Layer breaks down the task into selecting relevant candidate entities in each layer and gradually building the taxonomy from top to bottom. To minimize errors, we introduce the Ensemble-based Ranking Filter to reduce the hallucinated content generated at each iteration. Through extensive experiments, we demonstrate that Chain-of-Layer achieves state-of-the-art performance on four real-world benchmarks.

Chain-of-Layer: Iteratively Prompting Large Language Models for Taxonomy Induction from Limited Examples

TL;DR

Chain-of-Layer (CoL) introduces a layer-wise, in-context learning framework for taxonomy induction that uses Hierarchical Format Taxonomy Induction Instructions (HF) and an Ensemble-based Ranking Filter to mitigate hallucinations. By splitting the task into top-down layers and validating each step with an ensemble of templates and a masked-language model, CoL achieves state-of-the-art performance on WordNet sub-taxonomies and three large-scale taxonomies, while CoL-Zero demonstrates strong cross-domain adaptability. The approach systematically improves both precision and structural coherence over single-pass prompting methods, and ablation studies highlight the complementary roles of CoL and the filter. Overall, CoL offers a scalable, interpretable paradigm for constructing coherent taxonomies from limited example sets, with practical impact for search, recommendation, and QA systems.

Abstract

Automatic taxonomy induction is crucial for web search, recommendation systems, and question answering. Manual curation of taxonomies is expensive in terms of human effort, making automatic taxonomy construction highly desirable. In this work, we introduce Chain-of-Layer which is an in-context learning framework designed to induct taxonomies from a given set of entities. Chain-of-Layer breaks down the task into selecting relevant candidate entities in each layer and gradually building the taxonomy from top to bottom. To minimize errors, we introduce the Ensemble-based Ranking Filter to reduce the hallucinated content generated at each iteration. Through extensive experiments, we demonstrate that Chain-of-Layer achieves state-of-the-art performance on four real-world benchmarks.
Paper Structure (27 sections, 6 equations, 8 figures, 4 tables)

This paper contains 27 sections, 6 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Two Types of Methods for Taxonomy Induction
  • Figure 2: The overview of the framework for Chain-of-Layer (CoL): Given an entity list $\mathcal{V}$ and a root entity $v_0 \in \mathcal{V}$, CoL systematically organizes the entities in $\mathcal{V}$ into hierarchical groups, incrementally adding them to the taxonomy in a top-down manner at each iteration. In detail, at the $k$-th iteration, CoL-K selects a subset of entities $\mathcal{V}{\text{sel}}$ from the k-level and extends the existing taxonomy $\mathcal{T}^{k-1}$ with these entities. The newly generated parent-child relations ($\mathcal{T}^{k} \setminus \mathcal{T}^{k-1}$) are refined by an Ensemble-based Ranking Filter to reduce the hallucinations into the output taxonomy $\mathcal{T}^{k}$ in $k$-th iteration. The process continues until all entities in $\mathcal{V}$ are integrated into the resulting taxonomy.
  • Figure 3: Prompt Overview of Chain-of-Layer Framework
  • Figure 4: The details of the Ensemble-based Ranking Filter.
  • Figure 5: Performance analysis of the CoL across varying scales and domains. It shows Edge, Ancestor, and Node F1-scores for Wiki, DBLP, and SemEval-Sci taxonomies, ranging from 20 to 160 entities. An inflection point at the 80-entity threshold across all metrics and domains, emphasizing the scalability limitations of CoL.
  • ...and 3 more figures