Table of Contents
Fetching ...

PolyRecommender: A Multimodal Recommendation System for Polymer Discovery

Xin Wang, Yunhao Xiao, Rui Qiao

TL;DR

The paper tackles the challenge of discovering polymers in an enormous chemical space by introducing PolyRecommender, a two-stage framework that first retrieves candidates using language-based polymer representations and then ranks them with multimodal embeddings that fuse language and graph information. It employs language embeddings from PolyBERT (600-d) with LoRA fine-tuning and graph embeddings from a 5-layer D-MPNN (512-d), exploring three fusion strategies—Early Fusion, Gated Late Fusion, and Multi-gate Mixture-of-Experts (MMoE)—to perform multitask prediction for $T_g$, $T_m$, and $E_g$. The authors demonstrate through a 12,441-polymer PolyInfo dataset that MMoE provides the strongest, most balanced performance across Tg and Eg, while Tg and Emg benefit most from multimodal integration; a case study with Polyethylene oxide confirms practical utility in retrieving and ranking chemically relevant candidates. Overall, the work establishes a scalable, generalizable multimodal paradigm for AI-guided polymer design that accelerates discovery of next-generation polymers by leveraging complementary language and structural information.

Abstract

We introduce PolyRecommender, a multimodal discovery framework that integrates chemical language representations from PolyBERT with molecular graph-based representations from a graph encoder. The system first retrieves candidate polymers using language-based similarity and then ranks them using fused multimodal embeddings according to multiple target properties. By leveraging the complementary knowledge encoded in both modalities, PolyRecommender enables efficient retrieval and robust ranking across related polymer properties. Our work establishes a generalizable multimodal paradigm, advancing AI-guided design for the discovery of next-generation polymers.

PolyRecommender: A Multimodal Recommendation System for Polymer Discovery

TL;DR

The paper tackles the challenge of discovering polymers in an enormous chemical space by introducing PolyRecommender, a two-stage framework that first retrieves candidates using language-based polymer representations and then ranks them with multimodal embeddings that fuse language and graph information. It employs language embeddings from PolyBERT (600-d) with LoRA fine-tuning and graph embeddings from a 5-layer D-MPNN (512-d), exploring three fusion strategies—Early Fusion, Gated Late Fusion, and Multi-gate Mixture-of-Experts (MMoE)—to perform multitask prediction for , , and . The authors demonstrate through a 12,441-polymer PolyInfo dataset that MMoE provides the strongest, most balanced performance across Tg and Eg, while Tg and Emg benefit most from multimodal integration; a case study with Polyethylene oxide confirms practical utility in retrieving and ranking chemically relevant candidates. Overall, the work establishes a scalable, generalizable multimodal paradigm for AI-guided polymer design that accelerates discovery of next-generation polymers by leveraging complementary language and structural information.

Abstract

We introduce PolyRecommender, a multimodal discovery framework that integrates chemical language representations from PolyBERT with molecular graph-based representations from a graph encoder. The system first retrieves candidate polymers using language-based similarity and then ranks them using fused multimodal embeddings according to multiple target properties. By leveraging the complementary knowledge encoded in both modalities, PolyRecommender enables efficient retrieval and robust ranking across related polymer properties. Our work establishes a generalizable multimodal paradigm, advancing AI-guided design for the discovery of next-generation polymers.

Paper Structure

This paper contains 11 sections, 5 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: (a) The polymer recommender where polymers are recalled and ranked based on the similarity to the query polymer. (b) The detailed workflow to recommend candidates from the search space including material representation generation, candidate retrieval, fusion of multimodal representations, multi-task prediction for ranking.
  • Figure 2: Two-dimensional UMAP projection of the multimodal polymer embeddings. The distribution is colored by three distinct properties: (a) $T_{g}$, (b) $T_{m}$, and (c) $E_g$.
  • Figure 3: MMoE model analysis: (a) heatmap of task-specific expert utilization; (b) predicted $T_m$ distribution and (c) UMAP projection in space for top 50 candidates from a PEO query.
  • Figure 4: Distributions of available experimental data for the three target properties.