Table of Contents
Fetching ...

Extrapolation of Machine-Learning Interatomic Potentials for Organic and Polymeric Systems

Natalie E. Hooven, Arthur Y. Lin, Rose K. Cersonsky

TL;DR

This work investigates how machine-learned interatomic potentials (MLIPs) trained on short-chain alkanes can extrapolate to longer polyalkanes under the same conditions. Using a dataset spanning n = 1–8 and analyzing both energy and force predictions, the study shows that force extrapolation improves with chain length and converges when chemical environments are sufficiently sampled; total energy extrapolation, however, requires accounting for composition-driven mean shifts. The authors introduce a far-sighted SOAP representation to emphasize intermolecular contributions, which substantially enhances extrapolation of intermolecular energetics and provides a practical blueprint for building transferable MLIPs for polymeric systems. Overall, the work offers a roadmap for constructing polymer-appropriate MLIPs with manageable training data by focusing on environment convergence and targeted energy components, enabling scalable simulations of macromolecular materials.

Abstract

Machine-Learning Interatomic Potentials (MLIPs) have surged in popularity due to their promise of expanding the spatiotemporal scales possible for simulating molecules with high fidelity. The accuracy of any MLIP is dependent on the data used for its training; thus, for large molecules, like polymers, where accurate training data is prohibitively difficult to obtain, it becomes necessary to pursue non-traditional methods to construct MLIPs, many of which are based on constructing MLIPs using smaller, analogous chemical systems. However, we have yet to understand the limits to which smaller molecules can be used as a proxy for extrapolating macromolecular energetics. Here, we provide a ``control study'' for such experiments, exploring the ability of MLIP approaches to extrapolate between n=1-8 n-polyalkanes at identical conditions. Through Principal Covariates Classification, we quantitatively demonstrate how convergence in chemical environments between training and testing datasets coincides with an MLIP's transferability. Additionally, we show how careful attention to the construction of an MLIP's neighbor list can promote greater transferability when considering various levels of the energetic hierarchy. Our results establish a roadmap for how one can create transferable MLIPs for macromolecular systems without the prohibitive cost of constructing system-specific training data.

Extrapolation of Machine-Learning Interatomic Potentials for Organic and Polymeric Systems

TL;DR

This work investigates how machine-learned interatomic potentials (MLIPs) trained on short-chain alkanes can extrapolate to longer polyalkanes under the same conditions. Using a dataset spanning n = 1–8 and analyzing both energy and force predictions, the study shows that force extrapolation improves with chain length and converges when chemical environments are sufficiently sampled; total energy extrapolation, however, requires accounting for composition-driven mean shifts. The authors introduce a far-sighted SOAP representation to emphasize intermolecular contributions, which substantially enhances extrapolation of intermolecular energetics and provides a practical blueprint for building transferable MLIPs for polymeric systems. Overall, the work offers a roadmap for constructing polymer-appropriate MLIPs with manageable training data by focusing on environment convergence and targeted energy components, enabling scalable simulations of macromolecular materials.

Abstract

Machine-Learning Interatomic Potentials (MLIPs) have surged in popularity due to their promise of expanding the spatiotemporal scales possible for simulating molecules with high fidelity. The accuracy of any MLIP is dependent on the data used for its training; thus, for large molecules, like polymers, where accurate training data is prohibitively difficult to obtain, it becomes necessary to pursue non-traditional methods to construct MLIPs, many of which are based on constructing MLIPs using smaller, analogous chemical systems. However, we have yet to understand the limits to which smaller molecules can be used as a proxy for extrapolating macromolecular energetics. Here, we provide a ``control study'' for such experiments, exploring the ability of MLIP approaches to extrapolate between n=1-8 n-polyalkanes at identical conditions. Through Principal Covariates Classification, we quantitatively demonstrate how convergence in chemical environments between training and testing datasets coincides with an MLIP's transferability. Additionally, we show how careful attention to the construction of an MLIP's neighbor list can promote greater transferability when considering various levels of the energetic hierarchy. Our results establish a roadmap for how one can create transferable MLIPs for macromolecular systems without the prohibitive cost of constructing system-specific training data.

Paper Structure

This paper contains 15 sections, 6 equations, 9 figures.

Figures (9)

  • Figure 1: How well do ML potentials built on short-chain molecules generalize to their longer counterparts?
  • Figure 2: Results of MACE potential building on the total potential energy on $n=1-8$ alkanes.
  • Figure 3: Distribution of CH$_2$ environments for the different training sets, mapped Principal Covariates Classification (PCovCjorgensen_interpretable_2025). The map demonstrates the convergence of environments with increasing $n$.
  • Figure 4: Results of SOAP-Ridge MLIPs trained on intermolecular potential energy using different SOAP representations .
  • Figure S1: Additional visualizations of MACE models to predict total energy and forces.
  • ...and 4 more figures