Table of Contents
Fetching ...

LLM-Enhanced Energy Contrastive Learning for Out-of-Distribution Detection in Text-Attributed Graphs

Xiaoxu Ma, Dong Li, Minglai Shao, Xintao Wu, Chen Zhao

Abstract

Text-attributed graphs, where nodes are enriched with textual attributes, have become a powerful tool for modeling real-world networks such as citation, social, and transaction networks. However, existing methods for learning from these graphs often assume that the distributions of training and testing data are consistent. This assumption leads to significant performance degradation when faced with out-of-distribution (OOD) data. In this paper, we address the challenge of node-level OOD detection in text-attributed graphs, with the goal of maintaining accurate node classification while simultaneously identifying OOD nodes. We propose a novel approach, LLM-Enhanced Energy Contrastive Learning for Out-of-Distribution Detection in Text-Attributed Graphs (LECT), which integrates large language models (LLMs) and energy-based contrastive learning. The proposed method involves generating high-quality OOD samples by leveraging the semantic understanding and contextual knowledge of LLMs to create dependency-aware pseudo-OOD nodes, and applying contrastive learning based on energy functions to distinguish between in-distribution (IND) and OOD nodes. The effectiveness of our method is demonstrated through extensive experiments on six benchmark datasets, where our method consistently outperforms state-of-the-art baselines, achieving both high classification accuracy and robust OOD detection capabilities.

LLM-Enhanced Energy Contrastive Learning for Out-of-Distribution Detection in Text-Attributed Graphs

Abstract

Text-attributed graphs, where nodes are enriched with textual attributes, have become a powerful tool for modeling real-world networks such as citation, social, and transaction networks. However, existing methods for learning from these graphs often assume that the distributions of training and testing data are consistent. This assumption leads to significant performance degradation when faced with out-of-distribution (OOD) data. In this paper, we address the challenge of node-level OOD detection in text-attributed graphs, with the goal of maintaining accurate node classification while simultaneously identifying OOD nodes. We propose a novel approach, LLM-Enhanced Energy Contrastive Learning for Out-of-Distribution Detection in Text-Attributed Graphs (LECT), which integrates large language models (LLMs) and energy-based contrastive learning. The proposed method involves generating high-quality OOD samples by leveraging the semantic understanding and contextual knowledge of LLMs to create dependency-aware pseudo-OOD nodes, and applying contrastive learning based on energy functions to distinguish between in-distribution (IND) and OOD nodes. The effectiveness of our method is demonstrated through extensive experiments on six benchmark datasets, where our method consistently outperforms state-of-the-art baselines, achieving both high classification accuracy and robust OOD detection capabilities.
Paper Structure (13 sections, 12 equations, 6 figures, 1 table)

This paper contains 13 sections, 12 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: The overall pipeline of LECT. Given a text-attributed graph, we first construct pseudo-OOD nodes by injecting random edges and generating their textual content with an LLM. We then derive node representations using a text encoder and a GNN with a projector. Energy scores are subsequently computed to form Linked IND–OOD Pairs and Triplet Contrastive Pairs for training. Finally, the model identifies OOD samples based on the resulting energy levels.
  • Figure 2: Ablation study results of LECT on the Cora, Citeseer, and Pubmed datasets, showing the performance without contrastive learning and without LLM-generated samples, respectively.
  • Figure 3: Ablation study results of LECT on the Cora, Citeseer, and Pubmed datasets, showing the performance without $\mathcal{L}_{\text{ind-ood}}$ and without $\mathcal{L}_{\text{triplet}}$, respectively.
  • Figure 4: t-SNE visualization of node embeddings on the Cora for different baseline models and LECT.
  • Figure 5: t-SNE visualization and textual representation of the generated near-OOD and far-OOD samples on the Citeseer.
  • ...and 1 more figures