Table of Contents
Fetching ...

When LLMs meet open-world graph learning: a new perspective for unlabeled data uncertainty

Yanzhe Wen, Xunkai Li, Qi Zhang, Zhu Lei, Guang Zeng, Rong-Hua Li, Guoren Wang

TL;DR

This work tackles unlabeled data uncertainty in text-attributed graphs under open-world conditions by introducing Open-world Graph Assistant (OGA), a fully automated pipeline that combines adaptive label traceability (ALT) with a Graph Label Annotator (GLA). ALT creates a compact, discriminative ontology space by fusing semantic and topology signals through a graph propagation framework and an entropy-based rejection mechanism, underpinned by theoretical guarantees of low rank and bounded uncertainty. GLA uses structure-guided prompts and multi-granularity annotation to distill and fuse labels across communities, dramatically reducing LLM calls while generating coherent unknown-class annotations for model retraining. Across nine diverse datasets, OGA achieves state-of-the-art performance in unknown-class rejection and improves downstream GNN performance when retrained with GLA-generated annotations, demonstrating a practical, scalable approach to open-world graph learning with limited labeling.

Abstract

Recently, large language models (LLMs) have significantly advanced text-attributed graph (TAG) learning. However, existing methods inadequately handle data uncertainty in open-world scenarios, especially concerning limited labeling and unknown-class nodes. Prior solutions typically rely on isolated semantic or structural approaches for unknown-class rejection, lacking effective annotation pipelines. To address these limitations, we propose Open-world Graph Assistant (OGA), an LLM-based framework that combines adaptive label traceability, which integrates semantics and topology for unknown-class rejection, and a graph label annotator to enable model updates using newly annotated nodes. Comprehensive experiments demonstrate OGA's effectiveness and practicality.

When LLMs meet open-world graph learning: a new perspective for unlabeled data uncertainty

TL;DR

This work tackles unlabeled data uncertainty in text-attributed graphs under open-world conditions by introducing Open-world Graph Assistant (OGA), a fully automated pipeline that combines adaptive label traceability (ALT) with a Graph Label Annotator (GLA). ALT creates a compact, discriminative ontology space by fusing semantic and topology signals through a graph propagation framework and an entropy-based rejection mechanism, underpinned by theoretical guarantees of low rank and bounded uncertainty. GLA uses structure-guided prompts and multi-granularity annotation to distill and fuse labels across communities, dramatically reducing LLM calls while generating coherent unknown-class annotations for model retraining. Across nine diverse datasets, OGA achieves state-of-the-art performance in unknown-class rejection and improves downstream GNN performance when retrained with GLA-generated annotations, demonstrating a practical, scalable approach to open-world graph learning with limited labeling.

Abstract

Recently, large language models (LLMs) have significantly advanced text-attributed graph (TAG) learning. However, existing methods inadequately handle data uncertainty in open-world scenarios, especially concerning limited labeling and unknown-class nodes. Prior solutions typically rely on isolated semantic or structural approaches for unknown-class rejection, lacking effective annotation pipelines. To address these limitations, we propose Open-world Graph Assistant (OGA), an LLM-based framework that combines adaptive label traceability, which integrates semantics and topology for unknown-class rejection, and a graph label annotator to enable model updates using newly annotated nodes. Comprehensive experiments demonstrate OGA's effectiveness and practicality.

Paper Structure

This paper contains 57 sections, 8 theorems, 64 equations, 11 figures, 16 tables.

Key Result

Theorem 1

Let embeddings $\tilde{E} = \{\tilde{e}_i\}$ be generated by a Lipschitz-continuous encoder over a compact manifold $\mathcal{M}$, and let class concepts $\{C_c\}$ be aggregated via a propagation matrix $P$ with Dirichlet energy bounded by $\delta$. If intra-class variance is bounded by $\sigma^2$,

Figures (11)

  • Figure 1: A comparative overview of our proposed open-world learning pipeline and the conventional one. Our proposed OGA integrates LLM to introduce a new paradigm for unlabeled data uncertainty.
  • Figure 2: The overview of our proposed OGA.
  • Figure 3: ALT sensitivity analysis (model).
  • Figure 4: GLA sensitivity analysis (data).
  • Figure 5: Test accuracy convergence curves of GCN, GAT, and GraphSAGE on the Children dataset under three conditions: lower bound, upper bound, and OGA-enhanced graph. The curves represent the model's performance across epochs.
  • ...and 6 more figures

Theorems & Definitions (12)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5: Concept Space Properties
  • proof
  • Theorem 6: Topology-driven Hyperspherical Concept Modeling in ALT
  • proof
  • Theorem 7: Confidence Bound for Unknown-Class Rejection
  • proof
  • ...and 2 more