Table of Contents
Fetching ...

MMPolymer: A Multimodal Multitask Pretraining Framework for Polymer Property Prediction

Fanmeng Wang, Wentao Guo, Minjie Cheng, Shen Yuan, Hongteng Xu, Zhifeng Gao

TL;DR

MMPolymer introduces a multimodal multitask pretraining framework that jointly leverages polymer 1D P-SMILES sequences and 3D conformations to predict polymer properties. A Star Substitution strategy generates usable 3D structure from repeating units, enabling 3D-aware pretraining despite limited data. The model uses masked prediction, coordinate denoising, and cross-modal alignment to learn cohesive representations and achieves state-of-the-art results on multiple polymer-property datasets, even when fine-tuned with a single modality. This work highlights the crucial role of 3D structural information in polymer informatics and provides a scalable approach for leveraging scarce polymer 3D data.

Abstract

Polymers are high-molecular-weight compounds constructed by the covalent bonding of numerous identical or similar monomers so that their 3D structures are complex yet exhibit unignorable regularity. Typically, the properties of a polymer, such as plasticity, conductivity, bio-compatibility, and so on, are highly correlated with its 3D structure. However, existing polymer property prediction methods heavily rely on the information learned from polymer SMILES sequences (P-SMILES strings) while ignoring crucial 3D structural information, resulting in sub-optimal performance. In this work, we propose MMPolymer, a novel multimodal multitask pretraining framework incorporating polymer 1D sequential and 3D structural information to encourage downstream polymer property prediction tasks. Besides, considering the scarcity of polymer 3D data, we further introduce the "Star Substitution" strategy to extract 3D structural information effectively. During pretraining, in addition to predicting masked tokens and recovering clear 3D coordinates, MMPolymer achieves the cross-modal alignment of latent representations. Then we further fine-tune the pretrained MMPolymer for downstream polymer property prediction tasks in the supervised learning paradigm. Experiments show that MMPolymer achieves state-of-the-art performance in downstream property prediction tasks. Moreover, given the pretrained MMPolymer, utilizing merely a single modality in the fine-tuning phase can also outperform existing methods, showcasing the exceptional capability of MMPolymer in polymer feature extraction and utilization.

MMPolymer: A Multimodal Multitask Pretraining Framework for Polymer Property Prediction

TL;DR

MMPolymer introduces a multimodal multitask pretraining framework that jointly leverages polymer 1D P-SMILES sequences and 3D conformations to predict polymer properties. A Star Substitution strategy generates usable 3D structure from repeating units, enabling 3D-aware pretraining despite limited data. The model uses masked prediction, coordinate denoising, and cross-modal alignment to learn cohesive representations and achieves state-of-the-art results on multiple polymer-property datasets, even when fine-tuned with a single modality. This work highlights the crucial role of 3D structural information in polymer informatics and provides a scalable approach for leveraging scarce polymer 3D data.

Abstract

Polymers are high-molecular-weight compounds constructed by the covalent bonding of numerous identical or similar monomers so that their 3D structures are complex yet exhibit unignorable regularity. Typically, the properties of a polymer, such as plasticity, conductivity, bio-compatibility, and so on, are highly correlated with its 3D structure. However, existing polymer property prediction methods heavily rely on the information learned from polymer SMILES sequences (P-SMILES strings) while ignoring crucial 3D structural information, resulting in sub-optimal performance. In this work, we propose MMPolymer, a novel multimodal multitask pretraining framework incorporating polymer 1D sequential and 3D structural information to encourage downstream polymer property prediction tasks. Besides, considering the scarcity of polymer 3D data, we further introduce the "Star Substitution" strategy to extract 3D structural information effectively. During pretraining, in addition to predicting masked tokens and recovering clear 3D coordinates, MMPolymer achieves the cross-modal alignment of latent representations. Then we further fine-tune the pretrained MMPolymer for downstream polymer property prediction tasks in the supervised learning paradigm. Experiments show that MMPolymer achieves state-of-the-art performance in downstream property prediction tasks. Moreover, given the pretrained MMPolymer, utilizing merely a single modality in the fine-tuning phase can also outperform existing methods, showcasing the exceptional capability of MMPolymer in polymer feature extraction and utilization.
Paper Structure (27 sections, 15 equations, 4 figures, 5 tables)

This paper contains 27 sections, 15 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The scheme of the proposed method. Here, the red arrows indicate the pipeline of our multimodal multitask pretraining paradigm, and the blue arrows indicate the pipeline of fine-tuning steps for downstream polymer property prediction tasks. The blue modules are designed for 1D sequences (i.e., P-SMILES strings), and the red modules are designed for 3D conformations. The modules shared by 1D and 3D representations are labeled in green.
  • Figure 2: Left: the architecture of our 1D representation network, which takes polymer SMILES sequences (i.e., P-SMILES strings) as input, and outputs corresponding 1D sequential representation. Right: the architecture of our 3D representation network, which takes 3D conformation as input, and outputs corresponding 3D structural representation.
  • Figure 3: Visualization of our "Star Substitution" strategy.
  • Figure 4: t-SNE visualization of MMPolymer on eight polymer property datasets, where the color for each data point is determined by the corresponding ground truth (i.e., property label).