LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning
Junjie Xu, Zongyu Wu, Minhua Lin, Xiang Zhang, Suhang Wang
TL;DR
GALLON tackles molecular property prediction by uniting multimodal molecular information (SMILES, diagrams, and graphs) with both LLM and GNN knowledge. It distills insights from a large language model and a graph neural network into a compact MLP, enabling efficient, scalable inference while achieving state-of-the-art or competitive accuracy on MoleculeNet tasks. The framework demonstrates that combining representation and label distillation from heterogeneous teachers yields superior performance and highlights the importance of multimodal prompts and cross-modal mappings. This approach offers practical benefits for large-scale screening and can be extended to other multimodal scientific domains.
Abstract
Recent progress in Graph Neural Networks (GNNs) has greatly enhanced the ability to model complex molecular structures for predicting properties. Nevertheless, molecular data encompasses more than just graph structures, including textual and visual information that GNNs do not handle well. To bridge this gap, we present an innovative framework that utilizes multimodal molecular data to extract insights from Large Language Models (LLMs). We introduce GALLON (Graph Learning from Large Language Model Distillation), a framework that synergizes the capabilities of LLMs and GNNs by distilling multimodal knowledge into a unified Multilayer Perceptron (MLP). This method integrates the rich textual and visual data of molecules with the structural analysis power of GNNs. Extensive experiments reveal that our distilled MLP model notably improves the accuracy and efficiency of molecular property predictions.
