Table of Contents
Fetching ...

Bridging Large Language Models and Graph Structure Learning Models for Robust Representation Learning

Guangxin Su, Yifan Zhu, Wenjie Zhang, Hanchen Wang, Ying Zhang

TL;DR

LangGSL is introduced, a robust framework that integrates the complementary strengths of pre-trained language models and GSLMs to jointly enhance both node feature and graph structure learning and works synergistically, complementing each other's strengths and offsetting weaknesses within a variational information-maximizing framework.

Abstract

Graph representation learning, involving both node features and graph structures, is crucial for real-world applications but often encounters pervasive noise. State-of-the-art methods typically address noise by focusing separately on node features with large language models (LLMs) and on graph structures with graph structure learning models (GSLMs). In this paper, we introduce LangGSL, a robust framework that integrates the complementary strengths of pre-trained language models and GSLMs to jointly enhance both node feature and graph structure learning. In LangGSL, we first leverage LLMs to filter noise in the raw data and extract valuable cleaned information as features, enhancing the synergy of downstream models. During the mutual learning phase in LangGSL, the core idea is to leverage the relatively small language model (LM) to process local attributes and generate reliable pseudo-labels and informative node embeddings, which are then integrated into the GSLM's prediction phase. This approach enriches the global context and enhances overall performance. Meanwhile, GSLM refines the evolving graph structure constructed from the LM's output, offering updated labels back to the LM as additional guidance, thus facilitating a more effective mutual learning process. The LM and GSLM work synergistically, complementing each other's strengths and offsetting weaknesses within a variational information-maximizing framework, resulting in enhanced node features and a more robust graph structure. Extensive experiments on diverse graph datasets of varying scales and across different task scenarios demonstrate the scalability and effectiveness of the proposed approach.

Bridging Large Language Models and Graph Structure Learning Models for Robust Representation Learning

TL;DR

LangGSL is introduced, a robust framework that integrates the complementary strengths of pre-trained language models and GSLMs to jointly enhance both node feature and graph structure learning and works synergistically, complementing each other's strengths and offsetting weaknesses within a variational information-maximizing framework.

Abstract

Graph representation learning, involving both node features and graph structures, is crucial for real-world applications but often encounters pervasive noise. State-of-the-art methods typically address noise by focusing separately on node features with large language models (LLMs) and on graph structures with graph structure learning models (GSLMs). In this paper, we introduce LangGSL, a robust framework that integrates the complementary strengths of pre-trained language models and GSLMs to jointly enhance both node feature and graph structure learning. In LangGSL, we first leverage LLMs to filter noise in the raw data and extract valuable cleaned information as features, enhancing the synergy of downstream models. During the mutual learning phase in LangGSL, the core idea is to leverage the relatively small language model (LM) to process local attributes and generate reliable pseudo-labels and informative node embeddings, which are then integrated into the GSLM's prediction phase. This approach enriches the global context and enhances overall performance. Meanwhile, GSLM refines the evolving graph structure constructed from the LM's output, offering updated labels back to the LM as additional guidance, thus facilitating a more effective mutual learning process. The LM and GSLM work synergistically, complementing each other's strengths and offsetting weaknesses within a variational information-maximizing framework, resulting in enhanced node features and a more robust graph structure. Extensive experiments on diverse graph datasets of varying scales and across different task scenarios demonstrate the scalability and effectiveness of the proposed approach.

Paper Structure

This paper contains 22 sections, 18 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Key advantages of modules in the LangGSL.
  • Figure 2: The pipeline of LangGSL: (a), LLM filters out irrelevant or noisy information from raw text and provides task-related text attributes using designed prompts. (b), An iterative optimization process occurs between the LM and the graph structure learning model (GSLM), where the LM generates graph structures, embeddings, and pseudo labels based on cleaned text attributes provided by the LLM. In turn, the GSLM refines the graph structure and provides updated pseudo labels back to the LM. Three interaction mechanisms are introduced between the vanilla GNN and graph structure refinement component in GSLM.
  • Figure 3: Node classification accuracy (%) on Pubmed under adversarial attack at various perturbation rates (Ptb Rate).
  • Figure 4: Experiments on Pubmed with different kinds of Graph Structure Learning backbone under LangGSL framework.
  • Figure 5: Comparison of test accuracy of LangGSL on Raw Texts vs. Cleaned Texts.
  • ...and 4 more figures