Table of Contents
Fetching ...

LLM-Fusion: A Novel Multimodal Fusion Model for Accelerated Material Discovery

Onur Boyar, Indra Priyadarsini, Seiji Takeda, Lisa Hamada

TL;DR

The paper tackles accelerating material discovery by integrating diverse data modalities through a novel LLM-based fusion mechanism, LLM-Fusion. By encoding SMILES, SELFIES, Morgan fingerprints, and text descriptions and feeding modality-aware embeddings into a fixed-size LLM representation, the approach achieves superior property prediction performance compared with unimodal and naive fusion baselines. Across QM9 and ChEBI-20 datasets, including five prediction tasks, performance consistently improves as more modalities are incorporated, and using larger LLMs further boosts accuracy. The work demonstrates a scalable, flexible fusion framework with practical implications for rapid screening and discovery of materials with desired properties, while acknowledging computational cost as a drawback and outlining future extensions to generation tasks and newer LLM families.

Abstract

Discovering materials with desirable properties in an efficient way remains a significant problem in materials science. Many studies have tackled this problem by using different sets of information available about the materials. Among them, multimodal approaches have been found to be promising because of their ability to combine different sources of information. However, fusion algorithms to date remain simple, lacking a mechanism to provide a rich representation of multiple modalities. This paper presents LLM-Fusion, a novel multimodal fusion model that leverages large language models (LLMs) to integrate diverse representations, such as SMILES, SELFIES, text descriptions, and molecular fingerprints, for accurate property prediction. Our approach introduces a flexible LLM-based architecture that supports multimodal input processing and enables material property prediction with higher accuracy than traditional methods. We validate our model on two datasets across five prediction tasks and demonstrate its effectiveness compared to unimodal and naive concatenation baselines.

LLM-Fusion: A Novel Multimodal Fusion Model for Accelerated Material Discovery

TL;DR

The paper tackles accelerating material discovery by integrating diverse data modalities through a novel LLM-based fusion mechanism, LLM-Fusion. By encoding SMILES, SELFIES, Morgan fingerprints, and text descriptions and feeding modality-aware embeddings into a fixed-size LLM representation, the approach achieves superior property prediction performance compared with unimodal and naive fusion baselines. Across QM9 and ChEBI-20 datasets, including five prediction tasks, performance consistently improves as more modalities are incorporated, and using larger LLMs further boosts accuracy. The work demonstrates a scalable, flexible fusion framework with practical implications for rapid screening and discovery of materials with desired properties, while acknowledging computational cost as a drawback and outlining future extensions to generation tasks and newer LLM families.

Abstract

Discovering materials with desirable properties in an efficient way remains a significant problem in materials science. Many studies have tackled this problem by using different sets of information available about the materials. Among them, multimodal approaches have been found to be promising because of their ability to combine different sources of information. However, fusion algorithms to date remain simple, lacking a mechanism to provide a rich representation of multiple modalities. This paper presents LLM-Fusion, a novel multimodal fusion model that leverages large language models (LLMs) to integrate diverse representations, such as SMILES, SELFIES, text descriptions, and molecular fingerprints, for accurate property prediction. Our approach introduces a flexible LLM-based architecture that supports multimodal input processing and enables material property prediction with higher accuracy than traditional methods. We validate our model on two datasets across five prediction tasks and demonstrate its effectiveness compared to unimodal and naive concatenation baselines.

Paper Structure

This paper contains 12 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: LLM-Fusion architecture.
  • Figure 2: LLM-Fusion models used in experiments. A shows the three-modality model used for the HOMO, LUMO, and GAP property prediction tasks, B shows the four-modality model used for the LogP and QED prediction tasks.