Taxonomy Tree Generation from Citation Graph

Yuntong Hu; Zhuofeng Li; Zheng Zhang; Chen Ling; Raasikh Kanjiani; Boxin Zhao; Liang Zhao

Taxonomy Tree Generation from Citation Graph

Yuntong Hu, Zhuofeng Li, Zheng Zhang, Chen Ling, Raasikh Kanjiani, Boxin Zhao, Liang Zhao

TL;DR

The paper tackles automatic taxonomy generation from citation graphs to organize scientific knowledge and support literature reviews. It introduces HiGTL (Hierarchical Graph Taxonomy Learning), an end-to-end framework that jointly learns hierarchical clustering of papers and verbalizes taxonomy nodes through iterative graph-to-text generation guided by user prompts, with an explicit decomposition $f = h\circ g$. Key contributions include a hierarchical clustering module with CLU and AGG operators and a hierarchical contrastive loss $\mathcal{L}_{\text{HiMulCon}}$, a hierarchical taxonomy node verbalization objective $\mathcal{L}_{\text{Gen}}$ driven by an LLM, and a two-phase optimization leveraging pretraining and LoRA fine-tuning. Experiments on 518 citation graphs from computer science literature reviews demonstrate state-of-the-art taxonomy quality (e.g., Coverage $=0.9357$, Structure $=0.9413$, BertScore $=0.8694$) and superior taxonomy-guided literature review generation (HiReview) compared to baselines, confirming the framework’s practical impact for knowledge discovery, trend identification, and scalable literature synthesis.

Abstract

Constructing taxonomies from citation graphs is essential for organizing scientific knowledge, facilitating literature reviews, and identifying emerging research trends. However, manual taxonomy construction is labor-intensive, time-consuming, and prone to human biases, often overlooking pivotal but less-cited papers. In this paper, to enable automatic hierarchical taxonomy generation from citation graphs, we propose HiGTL (Hierarchical Graph Taxonomy Learning), a novel end-to-end framework guided by human-provided instructions or preferred topics. Specifically, we propose a hierarchical citation graph clustering method that recursively groups related papers based on both textual content and citation structure, ensuring semantically meaningful and structurally coherent clusters. Additionally, we develop a novel taxonomy node verbalization strategy that iteratively generates central concepts for each cluster, leveraging a pre-trained large language model (LLM) to maintain semantic consistency across hierarchical levels. To further enhance performance, we design a joint optimization framework that fine-tunes both the clustering and concept generation modules, aligning structural accuracy with the quality of generated taxonomies. Extensive experiments demonstrate that HiGTL effectively produces coherent, high-quality taxonomies.

Taxonomy Tree Generation from Citation Graph

TL;DR

. Key contributions include a hierarchical clustering module with CLU and AGG operators and a hierarchical contrastive loss

, a hierarchical taxonomy node verbalization objective

driven by an LLM, and a two-phase optimization leveraging pretraining and LoRA fine-tuning. Experiments on 518 citation graphs from computer science literature reviews demonstrate state-of-the-art taxonomy quality (e.g., Coverage

, Structure

, BertScore

) and superior taxonomy-guided literature review generation (HiReview) compared to baselines, confirming the framework’s practical impact for knowledge discovery, trend identification, and scalable literature synthesis.

Abstract

Paper Structure (34 sections, 16 equations, 8 figures, 6 tables)

This paper contains 34 sections, 16 equations, 8 figures, 6 tables.

Introduction
Related Work
Taxonomy Learning
Hierarchical Graph Clustering
Parameter-Efficient Fine-Tuning (PEFT)
Problem Formalization
Methodology
Overview
Hierarchical Citation Graph Clustering
Clustering Operator $\text{CLU}(\cdot)$.
Aggregation Operator $\text{AGG}(\cdot)$.
Hierarchical Citation Graph Clustering Objective.
Hierarchical Taxonomy Node Verbalization
Iterative Generation
Hierarchical Generation Objective
...and 19 more sections

Figures (8)

Figure 1: The citation graph and its taxonomy exhibit a clear hierarchical mapping.
Figure 2: Overall framework of proposed hierarchical taxonomy generation.
Figure 3: Taxonomy Tree Generated by HiGTL.
Figure 4: Taxonomy Tree Built by zheng2024towards.
Figure 5: Prompt used for evaluating coverage with LLMs.
...and 3 more figures

Taxonomy Tree Generation from Citation Graph

TL;DR

Abstract

Taxonomy Tree Generation from Citation Graph

Authors

TL;DR

Abstract

Table of Contents

Figures (8)