Table of Contents
Fetching ...

polyBART: A Chemical Linguist for Polymer Property Prediction and Generative Design

Anagha Savit, Harikrishna Sahu, Shivank Shukla, Wei Xiong, Rampi Ramprasad

TL;DR

Polymers inhabit an enormous design space, making targeted property optimization challenging. The authors introduce polyBART, a polymer foundation model built by continuing pretraining of molecular SELFIES-based TED models on PSELFIES, enabling bidirectional translation between polymer structures and properties. They demonstrate state-of-the-art performance in property prediction and show generative design capabilities, including the synthesis and validation of a language-model designed polymer with high thermal stability. The PSELFIES representation guarantees chemical validity and compatibility with existing molecular language models, enabling rapid exploration and design of novel polymers. This work provides a generalizable pathway to adapt molecular foundation models to polymer space and establishes a foundation for generative, property-driven polymer design that can be validated experimentally.

Abstract

Designing polymers for targeted applications and accurately predicting their properties is a key challenge in materials science owing to the vast and complex polymer chemical space. While molecular language models have proven effective in solving analogous problems for molecular discovery, similar advancements for polymers are limited. To address this gap, we propose polyBART, a language model-driven polymer discovery capability that enables rapid and accurate exploration of the polymer design space. Central to our approach is Pseudo-polymer SELFIES (PSELFIES), a novel representation that allows for the transfer of molecular language models to the polymer space. polyBART is, to the best of our knowledge, the first language model capable of bidirectional translation between polymer structures and properties, achieving state-of-the-art results in property prediction and design of novel polymers for electrostatic energy storage. Further, polyBART is validated through a combination of both computational and laboratory experiments. We report what we believe is the first successful synthesis and validation of a polymer designed by a language model, predicted to exhibit high thermal degradation temperature and confirmed by our laboratory measurements. Our work presents a generalizable strategy for adapting molecular language models to the polymer space and introduces a polymer foundation model, advancing generative polymer design that may be adapted for a variety of applications.

polyBART: A Chemical Linguist for Polymer Property Prediction and Generative Design

TL;DR

Polymers inhabit an enormous design space, making targeted property optimization challenging. The authors introduce polyBART, a polymer foundation model built by continuing pretraining of molecular SELFIES-based TED models on PSELFIES, enabling bidirectional translation between polymer structures and properties. They demonstrate state-of-the-art performance in property prediction and show generative design capabilities, including the synthesis and validation of a language-model designed polymer with high thermal stability. The PSELFIES representation guarantees chemical validity and compatibility with existing molecular language models, enabling rapid exploration and design of novel polymers. This work provides a generalizable pathway to adapt molecular foundation models to polymer space and establishes a foundation for generative, property-driven polymer design that can be validated experimentally.

Abstract

Designing polymers for targeted applications and accurately predicting their properties is a key challenge in materials science owing to the vast and complex polymer chemical space. While molecular language models have proven effective in solving analogous problems for molecular discovery, similar advancements for polymers are limited. To address this gap, we propose polyBART, a language model-driven polymer discovery capability that enables rapid and accurate exploration of the polymer design space. Central to our approach is Pseudo-polymer SELFIES (PSELFIES), a novel representation that allows for the transfer of molecular language models to the polymer space. polyBART is, to the best of our knowledge, the first language model capable of bidirectional translation between polymer structures and properties, achieving state-of-the-art results in property prediction and design of novel polymers for electrostatic energy storage. Further, polyBART is validated through a combination of both computational and laboratory experiments. We report what we believe is the first successful synthesis and validation of a polymer designed by a language model, predicted to exhibit high thermal degradation temperature and confirmed by our laboratory measurements. Our work presents a generalizable strategy for adapting molecular language models to the polymer space and introduces a polymer foundation model, advancing generative polymer design that may be adapted for a variety of applications.

Paper Structure

This paper contains 30 sections, 3 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Overview of the polyBART pipeline. (a) The encoder-decoder model is pretrained using a Masked Language Modeling (MLM) objective to learn the language of PSELFIES and construct the latent space. (b) For property prediction, the trained encoder generates embeddings for polymers with known properties, which are then mapped to target values using Gaussian Process Regressor (GPR). (c) For generating new structures, Gaussian noise is added to the learned embeddings, and the decoder generates candidate polymers, which are subsequently filtered based on property and synthesizability criteria.
  • Figure 2: (a) PSMILES containing terminal [*] groups are first transformed into a cyclic structure. (b) The cyclic structure is canonicalized, and a chemically valid bond is cleaved to linearize the molecule. The resulting termini are tagged with At atoms, yielding MSMILES. (c) The MSMILES is then transformed to PSELFIES.
  • Figure 3: Representative chemical structures generated by (a) polyBARTsmall and (b) polyBARTlarge, selected for their high Tg, high Eg, and ease of synthesizability. The selected examples highlight polyBART's ability to generate thermally and electronically robust polymers.
  • Figure 4: A high Tg polymer structure generated by polyBART and synthesized in the lab.
  • Figure 5: Distribution of thermal properties.
  • ...and 7 more figures