Table of Contents
Fetching ...

Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning

Sakhinana Sagar Srinivas, Venkataramana Runkana

TL;DR

This work introduces MMF, a Multi-Modal Fusion framework that marries graph-based molecular representations with linguistic knowledge from Large Language Models to predict molecular properties more accurately and robustly. By combining Graph Chebyshev Convolution with cross-modal attention and a mixture-of-experts output layer, MMF leverages zero-shot CoT descriptions and few-shot ICL prompts without fine-tuning LLMs, achieving state-of-the-art results on QM8, QM9 and additional datasets. Key contributions include a detailed task formulation, a scalable cross-modal fusion mechanism, thorough ablations demonstrating SEG and MOE-DP's roles, and extensive experiments across diverse datasets. The approach holds practical impact for drug discovery and material design by delivering improved predictive performance and resilience to distribution shifts using off-the-shelf LLMs and GNNs.

Abstract

In the field of chemistry, the objective is to create novel molecules with desired properties, facilitating accurate property predictions for applications such as material design and drug screening. However, existing graph deep learning methods face limitations that curb their expressive power. To address this, we explore the integration of vast molecular domain knowledge from Large Language Models (LLMs) with the complementary strengths of Graph Neural Networks (GNNs) to enhance performance in property prediction tasks. We introduce a Multi-Modal Fusion (MMF) framework that synergistically harnesses the analytical prowess of GNNs and the linguistic generative and predictive abilities of LLMs, thereby improving accuracy and robustness in predicting molecular properties. Our framework combines the effectiveness of GNNs in modeling graph-structured data with the zero-shot and few-shot learning capabilities of LLMs, enabling improved predictions while reducing the risk of overfitting. Furthermore, our approach effectively addresses distributional shifts, a common challenge in real-world applications, and showcases the efficacy of learning cross-modal representations, surpassing state-of-the-art baselines on benchmark datasets for property prediction tasks.

Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning

TL;DR

This work introduces MMF, a Multi-Modal Fusion framework that marries graph-based molecular representations with linguistic knowledge from Large Language Models to predict molecular properties more accurately and robustly. By combining Graph Chebyshev Convolution with cross-modal attention and a mixture-of-experts output layer, MMF leverages zero-shot CoT descriptions and few-shot ICL prompts without fine-tuning LLMs, achieving state-of-the-art results on QM8, QM9 and additional datasets. Key contributions include a detailed task formulation, a scalable cross-modal fusion mechanism, thorough ablations demonstrating SEG and MOE-DP's roles, and extensive experiments across diverse datasets. The approach holds practical impact for drug discovery and material design by delivering improved predictive performance and resilience to distribution shifts using off-the-shelf LLMs and GNNs.

Abstract

In the field of chemistry, the objective is to create novel molecules with desired properties, facilitating accurate property predictions for applications such as material design and drug screening. However, existing graph deep learning methods face limitations that curb their expressive power. To address this, we explore the integration of vast molecular domain knowledge from Large Language Models (LLMs) with the complementary strengths of Graph Neural Networks (GNNs) to enhance performance in property prediction tasks. We introduce a Multi-Modal Fusion (MMF) framework that synergistically harnesses the analytical prowess of GNNs and the linguistic generative and predictive abilities of LLMs, thereby improving accuracy and robustness in predicting molecular properties. Our framework combines the effectiveness of GNNs in modeling graph-structured data with the zero-shot and few-shot learning capabilities of LLMs, enabling improved predictions while reducing the risk of overfitting. Furthermore, our approach effectively addresses distributional shifts, a common challenge in real-world applications, and showcases the efficacy of learning cross-modal representations, surpassing state-of-the-art baselines on benchmark datasets for property prediction tasks.
Paper Structure (25 sections, 14 equations, 1 figure, 15 tables)

This paper contains 25 sections, 14 equations, 1 figure, 15 tables.

Figures (1)

  • Figure 1: Overview of MMF framework. Our framework leverages both the generative and predictive abilities of LLMs. The proposed molecular property prediction framework is a robust, efficient, and multi-step pipeline for predicting molecular properties with high precision. (a) Firstly, it introduces a multi-faceted semantic fusion strategy that leverages Zero-shot CoT prompting of LLMs approach alongside GNNs to generate semantically-aligned cross-modal embeddings for molecules, seamlessly integrating structured and unstructured data. (b) Secondly, the framework incorporates ICL, which taps into the inherent knowledge within pre-trained parameters of LLMs to make accurate predictions on new, unseen molecules, generating prediction embeddings guided by context-augmented prompts without the necessity for explicit fine-tuning on labeled data. (c) Lastly, it employs a MOE mechanism that integrates cross-modal and prediction embeddings through a gating mechanism at the output layer and optimizes the unified embeddings for downstream supervised regression tasks to achieve high-precision predictions. Overall, the cohesive framework aims to synergize multiple learning strategies to achieve unparalleled precision and efficiency in molecular property predictions. It is important to note that we do not customize LLMs through fine-tuning for task-specific adaptation. Instead, we access LLMs through LMaaSsun2022black platforms via text-based API interaction. The three steps (a), (b), and (c) are illustrated with blue, red, and black arrow lines.