LLM-driven Knowledge Distillation for Dynamic Text-Attributed Graphs
Amit Roy, Ning Yan, Masood Mortazavi
TL;DR
This work addresses learning effective representations for Dynamic Text-Attributed Graphs (DyTAGs) by marrying temporal-encoded GNNs with text-rich edge representations generated by large language models. The proposed LKD4DyTAG framework uses a simple temporal encoding on edges to capture structural and temporal dynamics, while distilling the semantic knowledge from a frozen LLM into the spatio-temporal GNN representation via a knowledge distillation loss. Empirical results on six real-world DyTAG benchmarks show that LKD4DyTAG achieves strong performance on future link prediction in the transductive setting and generally improved edge classification, with ablation analyses highlighting the importance of temporal encoding and the distillation loss for semantic guidance. The approach demonstrates a scalable path to integrate linguistic context into dynamic graphs, with potential for handling large-scale DyTAGs and more complex temporal-textual interactions in downstream applications.
Abstract
Dynamic Text-Attributed Graphs (DyTAGs) have numerous real-world applications, e.g. social, collaboration, citation, communication, and review networks. In these networks, nodes and edges often contain text descriptions, and the graph structure can evolve over time. Future link prediction, edge classification, relation generation, and other downstream tasks on DyTAGs require powerful representations that encode structural, temporal, and textual information. Although graph neural networks (GNNs) excel at handling structured data, encoding temporal information within dynamic graphs remains a significant challenge. In this work, we propose LLM-driven Knowledge Distillation for Dynamic Text Attributed Graph (LKD4DyTAG) with temporal encoding to address these challenges. We use a simple, yet effective approach to encode temporal information in edges so that graph convolution can simultaneously capture both temporal and structural information in the hidden representations. To leverage LLM's text processing capabilities for learning richer representations on DyTAGs, we distill knowledge from LLM-driven edge representations (based on a neighborhood's text attributes) into saptio-temporal representations using a lightweight GNN model that encodes temporal and structural information. The objective of knowledge distillation enables the GNN to learn representations that more effectively encode the available structural, temporal, and textual information in DyTAG. We conducted extensive experimentation on six real-world DyTAG datasets to verify the effectiveness of our approach LKD4DyTAG for future link prediction and edge classification task. The results show that our approach significantly improves the performance of downstream tasks compared to the baseline models.
