polyRETRO: a Language Model Approach to predict Polymerization Class and Monomer(s) for a Target Polymer
Sakshi Agarwal, Wei Xiong, Rampi Ramprasad
TL;DR
polyRETRO tackles the challenge of translating ML-designed polymers into experimentally feasible routes by integrating large-language models into a two-objective retrosynthetic framework. The method first predicts the polymerization class from a target SMILES, then infers reaction templates and monomers to reconstruct viable synthesis paths, with natural-language templates enhancing interpretability and generalization. The approach achieves high performance, with polymerization-class accuracy of $0.98$, strong template accuracy for addition and condensation, and monomer-prediction accuracy near $0.97$ for the best model, while ring-opening routes map directly from the repeat unit. This work provides a scalable, interpretable bridge between in silico polymer design and lab-scale synthesis, enabling faster experimental validation and broader exploration of synthetic polymer space.
Abstract
While machine learning has transformed polymer design by enabling rapid property prediction and candidate generation, translating these designs into experimentally realizable materials remains a critical challenge. Traditionally, the synthesis of target polymers has relied heavily on expert intuition and prior experience. The lack of automated retrosynthetic tools to assist chemists, limit the rapid practical impact of data-driven polymer discovery. To expedite lab-scale validation and beyond, we present a retrosynthetic framework that leverages large language models (LLMs) to guide polymer synthesis. Our approach, which we call polyRETRO, involves two key steps: 1) predicting the most likely polymerization reaction class of a target polymer and 2) identifying the underlying chemical transformation templates and the corresponding monomers, using primarily natural-language based constructs. This LLM-driven framework enables direct retrosynthetic analysis given just the target polymer SMILES string. polyRETRO constitutes a initial step towards a scalable, interpretable, and generalizable approach to bridge the gap between computational design and experimental synthesis.
