Coarse-to-Fine Open-Set Graph Node Classification with Large Language Models
Xueqi Ma, Xingjun Ma, Sarah Monazam Erfani, Danilo Mandic, James Bailey
TL;DR
The paper tackles open-set node classification on graphs by introducing a coarse-to-fine framework (CFC) that leverages large language models to identify semantic OOD samples and generate candidate OOD labels. A GNN-based fine classifier then discriminates ID nodes and detects OODs, aided by denoising and OOD data augmentation via manifold mixup. Final OOD classification is achieved through LLM prompts using a post-OOD label space to annotate OOD samples, yielding notable gains in both OOD detection and multi-class OOD labeling across graph and text domains. The approach emphasizes semantic, interpretable OOD representations without relying on synthetic OOD samples and demonstrates strong practical impact for open-world graph learning, with theoretical analysis supporting subspace expansion and smoother decision boundaries.
Abstract
Developing open-set classification methods capable of classifying in-distribution (ID) data while detecting out-of-distribution (OOD) samples is essential for deploying graph neural networks (GNNs) in open-world scenarios. Existing methods typically treat all OOD samples as a single class, despite real-world applications, especially high-stake settings such as fraud detection and medical diagnosis, demanding deeper insights into OOD samples, including their probable labels. This raises a critical question: can OOD detection be extended to OOD classification without true label information? To address this question, we propose a Coarse-to-Fine open-set Classification (CFC) framework that leverages large language models (LLMs) for graph datasets. CFC consists of three key components: a coarse classifier that uses LLM prompts for OOD detection and outlier label generation, a GNN-based fine classifier trained with OOD samples identified by the coarse classifier for enhanced OOD detection and ID classification, and refined OOD classification achieved through LLM prompts and post-processed OOD labels. Unlike methods that rely on synthetic or auxiliary OOD samples, CFC employs semantic OOD instances that are genuinely out-of-distribution based on their inherent meaning, improving interpretability and practical utility. Experimental results show that CFC improves OOD detection by ten percent over state-of-the-art methods on graph and text domains and achieves up to seventy percent accuracy in OOD classification on graph datasets.
