Table of Contents
Fetching ...

Generative Modeling of Entangled Polymers with a Distance-Based Variational Autoencoder

Pietro Chiarantoni, Oscar Serra, Mohammad Erfan Mowlaei, Venkata Surya Kumar Choutipalli, Mark DelloStritto, Xinghua Shi, Micheal L. Klein, Vincenzo Carnevale

TL;DR

Generating dense polymer globule configurations via molecular dynamics is computationally intensive, especially in melts. The authors train a distance-matrix variational autoencoder with a Conv-Transformer encoder and a Gaussian mixture latent prior, then generate new configurations by decoding latent samples and embedding with multidimensional scaling followed by short SDK MD relaxation. Reconstructions reproduce key observables such as the radial distribution function and topological measures, while generated samples can be physically viable and novel, albeit with a broader energy distribution that requires filtering. Overall, the framework offers a scalable approach to embed, sample, and generate dense polymer configurations, with potential extensions to coordinate-based and atomistic simulations for broader applicability.

Abstract

We present a variational autoencoder framework for learning and generating configurations of structured polymer globules from distance matrices. We used coarse-grained molecular dynamics to sample polyethylene structures, which we used as the training set for our deep learning model. By combining convolution and attention layers, the model encodes the structural patterns of distance matrices into an organized and roto-translationally invariant latent space of lower dimensionality. The generative capability of the variational autoencoder, coupled with a post-processing pipeline based on multidimensional scaling and short molecular dynamics, enables the recovery of physically meaningful configurations. The reconstructed and generated samples reproduce key observables, including energy, size, and entanglement, despite minor discrepancies in the raw decoder output.

Generative Modeling of Entangled Polymers with a Distance-Based Variational Autoencoder

TL;DR

Generating dense polymer globule configurations via molecular dynamics is computationally intensive, especially in melts. The authors train a distance-matrix variational autoencoder with a Conv-Transformer encoder and a Gaussian mixture latent prior, then generate new configurations by decoding latent samples and embedding with multidimensional scaling followed by short SDK MD relaxation. Reconstructions reproduce key observables such as the radial distribution function and topological measures, while generated samples can be physically viable and novel, albeit with a broader energy distribution that requires filtering. Overall, the framework offers a scalable approach to embed, sample, and generate dense polymer configurations, with potential extensions to coordinate-based and atomistic simulations for broader applicability.

Abstract

We present a variational autoencoder framework for learning and generating configurations of structured polymer globules from distance matrices. We used coarse-grained molecular dynamics to sample polyethylene structures, which we used as the training set for our deep learning model. By combining convolution and attention layers, the model encodes the structural patterns of distance matrices into an organized and roto-translationally invariant latent space of lower dimensionality. The generative capability of the variational autoencoder, coupled with a post-processing pipeline based on multidimensional scaling and short molecular dynamics, enables the recovery of physically meaningful configurations. The reconstructed and generated samples reproduce key observables, including energy, size, and entanglement, despite minor discrepancies in the raw decoder output.

Paper Structure

This paper contains 5 sections, 3 equations, 6 figures.

Figures (6)

  • Figure 1: Molecular dynamics simulation setup and Variational autoencoder architecture. (a) Coarse-grained polyethylene chain with 400 effective monomers simulated via molecular dynamics using the SDK model. The system is initialized in a swollen state (left) and, after equilibration, reaches globular configurations (right) in both melted ($T$$>$$T$$^{*}$) and semi-crystalline states ($T$$<$$T$$^{*}$), with $T$$^{*}$ indicating the nominal crystallization temperature of $\sim 280 \,\text{K}$. Monomers are colored according to their backbone index. (b) Distance matrices computed for the samples shown in panel a. Below $T$$^{*}$, characteristic crystallization patterns arising from aligned PE stems appear. (c) Variational Autoencoder framework. Input distance matrices are first fed into an encoder composed of a CCT-based downsampling module and transformer blocks, which map them into latent variables ($\mu$, $\sigma$). Latent vectors $z$ are then sampled and passed through a decoder mirroring the encoder module to reconstruct distance matrices, which are finally used to compute loss functions and update the network weights. (d) Example of generated distance matrix obtained by sampling and decoding the learned latent distribution. Generated matrices are post-processed into 3D coordinates using multidimensional scaling and relaxed with short SDK simulations.
  • Figure 2: Reconstruction of structural patterns from input distance matrices. (a) Radial distribution functions $g(r)$ of input (black) and reconstructed (yellow) matrices at $T = 200$ K. The main peaks of the distribution are highlighted by shaded regions. (b) Example submatrices ($70 \times 70$) extracted along the main diagonal of an input (left) and the corresponding reconstructed (right) distance matrix, with elements colored according to the $g(r)$ peaks in panel (a). (c) Probability distribution of the triangle inequality violation degree $v$ for all reconstructed samples at $T = 200$ K.
  • Figure 3: Properties of embedded reconstructed samples. (a) Example of reconstructed distance matrix at $T = 200$ K, post-processed via multidimensional scaling (MDS) and SDK energy minimization, yielding a three-dimensional configuration corresponding to the nearest local energy minimum. (b) Probability distributions of the root-mean-squared distance (RMSD) between optimally aligned reconstructed and input configurations for the three explored temperatures, with SDK-minimized configurations (solid lines) compared to raw MDS embeddings (dashed lines). In the inset, same distributions computed using uncorrelated canonical samples from MD simulations. All the distributions are smoothed using a Gaussian KDE for visual clarity. (c) Scatter plots comparing input and reconstructed physical observables: potential energy $E$ (left), radius of gyration $R_{g}$ (middle), and writhe $W$ (right). Points are colored by input temperature (same scheme as panel b), with Pearson correlation coefficients $r$ reported for each temperature.
  • Figure 4: Generation and embedding of novel samples. (a) Example of generated distance matrix at $T = 200$ K embedded by multidimensional scaling and minimized with the SDK potential. (b) Potential energy trajectories during MD relaxation of $100$ generated samples at $T = 200$ K. Convergence is reached on the order of 3 ps (vertical dashed line), but the stationary energy distribution is broader than that of the canonical ensemble. The ensemble mean $\langle E \rangle$ (solid red line) and standard deviation $\sigma$ (dashed black line) are indicated, and only configurations with relaxed energies within $\langle E \rangle \pm 2\sigma$ are kept. (c) Final generated configurations after the filtering procedure described in panel b. These configurations reproduce the radial distribution function $g(r)$ in agreement with the corresponding input at $T = 200$ K.
  • Figure 5: Physical observables of embedded generated samples. Probability distributions of (a) the radius of gyration $R_{g}$ and (b) the writhe $W$ for input (solid lines) and generated embedded (dashed lines) configurations at the three explored temperatures. All the distributions are smoothed using a Gaussian KDE for visual clarity and curves are colored by input temperature.
  • ...and 1 more figures