Table of Contents
Fetching ...

ADEPT-PolyGraphMT: Automated Molecular Simulation and Multi-Task Multi-Fidelity Machine Learning for Polymer Property Generation and Prediction

Sobin Alosious, Yuhan Liu, Jiaxin Xu, Gang Liu, Renzheng Zhang, Meng Jiang, Tengfei Luo

Abstract

The discovery of polymers with targeted properties is challenged by the vast chemical design space and the limited availability of consistent, high-quality data across multiple properties. In this work, an integrated polymer informatics framework is presented that combines the Automated molecular Dynamics Engine for Polymer simulaTions (ADEPT) workflow with multi-task and multi-fidelity machine learning (PolyGraphMT). Polymer repeat units are represented as molecular graphs and processed using a graph neural network to learn structure-property relationships. Starting from SMILES representations for monomers, ADEPT automates the construction of atomistic models and the evaluation of their properties using molecular dynamics simulations and density functional theory calculations. The simulation data are combined with curated experimental data and group contribution theory estimates to construct a unified dataset of approximately 62,000 polymer property values spanning 28 properties. Using this dataset, inter-property correlations are analyzed, and multi-task learning strategies are evaluated for joint property prediction. The results show that multi-task models achieve performance comparable to single-task models in data-rich regimes and exhibit superior accuracy as training data become limited. In addition, fidelity-aware training improves predictive accuracy when combining experimental and computational data sources. The trained models are further applied to large-scale property prediction for polymers in the PolyInfo database and the PI1M virtual polymer library, producing physically consistent property distributions across a broad chemical space. Overall, the proposed framework provides a structured approach for scalable prediction and screening of polymer properties across multiple property types and data fidelity levels.

ADEPT-PolyGraphMT: Automated Molecular Simulation and Multi-Task Multi-Fidelity Machine Learning for Polymer Property Generation and Prediction

Abstract

The discovery of polymers with targeted properties is challenged by the vast chemical design space and the limited availability of consistent, high-quality data across multiple properties. In this work, an integrated polymer informatics framework is presented that combines the Automated molecular Dynamics Engine for Polymer simulaTions (ADEPT) workflow with multi-task and multi-fidelity machine learning (PolyGraphMT). Polymer repeat units are represented as molecular graphs and processed using a graph neural network to learn structure-property relationships. Starting from SMILES representations for monomers, ADEPT automates the construction of atomistic models and the evaluation of their properties using molecular dynamics simulations and density functional theory calculations. The simulation data are combined with curated experimental data and group contribution theory estimates to construct a unified dataset of approximately 62,000 polymer property values spanning 28 properties. Using this dataset, inter-property correlations are analyzed, and multi-task learning strategies are evaluated for joint property prediction. The results show that multi-task models achieve performance comparable to single-task models in data-rich regimes and exhibit superior accuracy as training data become limited. In addition, fidelity-aware training improves predictive accuracy when combining experimental and computational data sources. The trained models are further applied to large-scale property prediction for polymers in the PolyInfo database and the PI1M virtual polymer library, producing physically consistent property distributions across a broad chemical space. Overall, the proposed framework provides a structured approach for scalable prediction and screening of polymer properties across multiple property types and data fidelity levels.

Paper Structure

This paper contains 24 sections, 20 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Overview of the ADEPT and PolyGraphMT workflow. Polymer SMILES are processed through the ADEPT pipeline to generate thermal, mechanical, structural, transport, electronic, and optical properties using MD and DFT. These computational data are combined with curated experimental measurements to construct a unified dataset, which is used to train multi-task, multi-fidelity ML models for polymer property prediction and large-scale screening.
  • Figure 2: NEMD workflow and validation for $\kappa$ calculations. (a) Representative MD snapshot illustrating the NEMD setup, where a constant heat flux is imposed between hot and cold regions using thermostatted slabs, with boundary layers composed of fixed atoms. (b) Cumulative energy exchanged with the heat reservoirs as a function of time; a linear fit is used to obtain the heat flux $J$. (c) Steady-state temperature profile along the transport direction; the linear region is fitted to extract the temperature gradient $\nabla T$. (d) Comparison of $\kappa$ values obtained from MD simulations with experimental data, demonstrating good agreement between NEMD predictions and experiments.
  • Figure 3: (a) Temperature-dependent density profile used to estimate the $T_g$ from the intersection of linear fits to the low- and high-temperature regimes. (b) Parity plot comparing MD-predicted and experimental $T_g$. (c) Time evolution of $K$ obtained from independent simulations with different initial configurations, illustrating temporal fluctuations and convergence. (d) Parity plot comparing MD-predicted and experimental $K$.
  • Figure 4: (a) Parity plot comparing MD-predicted $\rho$ with experimental values, showing systematic deviation. (b) Density parity plot after bias correction, demonstrating improved agreement between MD and experimental $\rho$. (c) Representative enthalpy–temperature relationship obtained from MD simulations; block-averaged enthalpy values are fitted linearly to extract $C_p$. (d) Parity plot comparing MD-predicted $C_p$ with experimental $C_p$, highlighting significant bias in raw MD predictions. (e) $C_p$ parity plot after bias correction, showing improved correlation and reduced error relative to experimental data.
  • Figure 5: Polymer property heatmap of Pearson correlation coefficients. Red regions indicate positive correlations, blue regions indicate negative correlations, and white regions indicate negligible correlations. Only property pairs with at least 30 data points ($N \geq 30$) are included.
  • ...and 4 more figures