LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning

Junjie Xu; Zongyu Wu; Minhua Lin; Xiang Zhang; Suhang Wang

LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning

Junjie Xu, Zongyu Wu, Minhua Lin, Xiang Zhang, Suhang Wang

TL;DR

GALLON tackles molecular property prediction by uniting multimodal molecular information (SMILES, diagrams, and graphs) with both LLM and GNN knowledge. It distills insights from a large language model and a graph neural network into a compact MLP, enabling efficient, scalable inference while achieving state-of-the-art or competitive accuracy on MoleculeNet tasks. The framework demonstrates that combining representation and label distillation from heterogeneous teachers yields superior performance and highlights the importance of multimodal prompts and cross-modal mappings. This approach offers practical benefits for large-scale screening and can be extended to other multimodal scientific domains.

Abstract

Recent progress in Graph Neural Networks (GNNs) has greatly enhanced the ability to model complex molecular structures for predicting properties. Nevertheless, molecular data encompasses more than just graph structures, including textual and visual information that GNNs do not handle well. To bridge this gap, we present an innovative framework that utilizes multimodal molecular data to extract insights from Large Language Models (LLMs). We introduce GALLON (Graph Learning from Large Language Model Distillation), a framework that synergizes the capabilities of LLMs and GNNs by distilling multimodal knowledge into a unified Multilayer Perceptron (MLP). This method integrates the rich textual and visual data of molecules with the structural analysis power of GNNs. Extensive experiments reveal that our distilled MLP model notably improves the accuracy and efficiency of molecular property predictions.

LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning

TL;DR

Abstract

Paper Structure (21 sections, 12 equations, 7 figures, 10 tables)

This paper contains 21 sections, 12 equations, 7 figures, 10 tables.

Introduction
Related Work
Methodology: GALLON
Extracting Knowledge from LLMs
Pretraining and Finetuning
Knowledge Distillation to MLP
Experiments
Experimental Setup
Distillation Results
Contributions of LLM and GNN
Efficiency Comparison
Influence of Multimodality
Influence of Large Language Model
Case Study
Conclusion and Future Work
...and 6 more sections

Figures (7)

Figure 1: The framework of GALLON (Graph Learning from Large Language Model Distillation).
Figure 2: An example prompt for a molecule of the Freesolv dataset.
Figure 3: ROCAUC vs log of inference time (ms) on the BACE dataset.
Figure 4: ROCAUC vs log of number of parameters on the BACE dataset.
Figure 5: Ablation study of multimodalities.
...and 2 more figures

LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning

TL;DR

Abstract

LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)