Table of Contents
Fetching ...

LLM-driven Knowledge Distillation for Dynamic Text-Attributed Graphs

Amit Roy, Ning Yan, Masood Mortazavi

TL;DR

This work addresses learning effective representations for Dynamic Text-Attributed Graphs (DyTAGs) by marrying temporal-encoded GNNs with text-rich edge representations generated by large language models. The proposed LKD4DyTAG framework uses a simple temporal encoding on edges to capture structural and temporal dynamics, while distilling the semantic knowledge from a frozen LLM into the spatio-temporal GNN representation via a knowledge distillation loss. Empirical results on six real-world DyTAG benchmarks show that LKD4DyTAG achieves strong performance on future link prediction in the transductive setting and generally improved edge classification, with ablation analyses highlighting the importance of temporal encoding and the distillation loss for semantic guidance. The approach demonstrates a scalable path to integrate linguistic context into dynamic graphs, with potential for handling large-scale DyTAGs and more complex temporal-textual interactions in downstream applications.

Abstract

Dynamic Text-Attributed Graphs (DyTAGs) have numerous real-world applications, e.g. social, collaboration, citation, communication, and review networks. In these networks, nodes and edges often contain text descriptions, and the graph structure can evolve over time. Future link prediction, edge classification, relation generation, and other downstream tasks on DyTAGs require powerful representations that encode structural, temporal, and textual information. Although graph neural networks (GNNs) excel at handling structured data, encoding temporal information within dynamic graphs remains a significant challenge. In this work, we propose LLM-driven Knowledge Distillation for Dynamic Text Attributed Graph (LKD4DyTAG) with temporal encoding to address these challenges. We use a simple, yet effective approach to encode temporal information in edges so that graph convolution can simultaneously capture both temporal and structural information in the hidden representations. To leverage LLM's text processing capabilities for learning richer representations on DyTAGs, we distill knowledge from LLM-driven edge representations (based on a neighborhood's text attributes) into saptio-temporal representations using a lightweight GNN model that encodes temporal and structural information. The objective of knowledge distillation enables the GNN to learn representations that more effectively encode the available structural, temporal, and textual information in DyTAG. We conducted extensive experimentation on six real-world DyTAG datasets to verify the effectiveness of our approach LKD4DyTAG for future link prediction and edge classification task. The results show that our approach significantly improves the performance of downstream tasks compared to the baseline models.

LLM-driven Knowledge Distillation for Dynamic Text-Attributed Graphs

TL;DR

This work addresses learning effective representations for Dynamic Text-Attributed Graphs (DyTAGs) by marrying temporal-encoded GNNs with text-rich edge representations generated by large language models. The proposed LKD4DyTAG framework uses a simple temporal encoding on edges to capture structural and temporal dynamics, while distilling the semantic knowledge from a frozen LLM into the spatio-temporal GNN representation via a knowledge distillation loss. Empirical results on six real-world DyTAG benchmarks show that LKD4DyTAG achieves strong performance on future link prediction in the transductive setting and generally improved edge classification, with ablation analyses highlighting the importance of temporal encoding and the distillation loss for semantic guidance. The approach demonstrates a scalable path to integrate linguistic context into dynamic graphs, with potential for handling large-scale DyTAGs and more complex temporal-textual interactions in downstream applications.

Abstract

Dynamic Text-Attributed Graphs (DyTAGs) have numerous real-world applications, e.g. social, collaboration, citation, communication, and review networks. In these networks, nodes and edges often contain text descriptions, and the graph structure can evolve over time. Future link prediction, edge classification, relation generation, and other downstream tasks on DyTAGs require powerful representations that encode structural, temporal, and textual information. Although graph neural networks (GNNs) excel at handling structured data, encoding temporal information within dynamic graphs remains a significant challenge. In this work, we propose LLM-driven Knowledge Distillation for Dynamic Text Attributed Graph (LKD4DyTAG) with temporal encoding to address these challenges. We use a simple, yet effective approach to encode temporal information in edges so that graph convolution can simultaneously capture both temporal and structural information in the hidden representations. To leverage LLM's text processing capabilities for learning richer representations on DyTAGs, we distill knowledge from LLM-driven edge representations (based on a neighborhood's text attributes) into saptio-temporal representations using a lightweight GNN model that encodes temporal and structural information. The objective of knowledge distillation enables the GNN to learn representations that more effectively encode the available structural, temporal, and textual information in DyTAG. We conducted extensive experimentation on six real-world DyTAG datasets to verify the effectiveness of our approach LKD4DyTAG for future link prediction and edge classification task. The results show that our approach significantly improves the performance of downstream tasks compared to the baseline models.

Paper Structure

This paper contains 14 sections, 9 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Example Dynamic Text-Attributed Graph in the context of 2024 Presidential Election in USA. Each entity is described with text, while the relations between them are represented by edges including temporal information.
  • Figure 2: LKD4DyTAG : LLM-driven Knowledge Distillation for Dynamic Text-Attributed Graph with temporal encoding. First, temporal encodings (red) are added to the graph edges along with their BERT encodings (green). The graph information is then transformed into textual information by incorporating the adjacent node's 1-hop neighbors to encode the semantic context, as illustrated in the textbox. The derived text description is fed into a pretrained teacher LLM model to obtain the textual representation, while the graph is fed into a trainable student GNN to obtain the spatio-temporal representation. These two representations are brought closer in latent space by minimizing the cosine similarity, which defines the knowledge distillation loss. Additionally, the student model's spatio-temporal edge representation is utilized for the downstream task, which is trained using the task-specific loss.
  • Figure 3: Textual edge representation from the LLM using the descriptions of adjacent nodes' neighbors.
  • Figure 4: The impact of $\lambda_{kd}$ for future link prediction and edge classification tasks.
  • Figure 5: The impact of knowledge distillation on the future link prediction task for the inductive setting (right) and the transductive setting (left).