Table of Contents
Fetching ...

Inverse Design of Copolymers Including Stoichiometry and Chain Architecture

Gabriel Vogel, Jana M. Weber

TL;DR

This work builds upon a recent polymer representation that includes stoichiometries and chain architectures of monomer ensembles and develops a novel variational autoencoder (VAE) architecture encoding a graph and decoding a string to enable the handling of partly labelled datasets.

Abstract

The demand for innovative synthetic polymers with improved properties is high, but their structural complexity and vast design space hinder rapid discovery. Machine learning-guided molecular design is a promising approach to accelerate polymer discovery. However, the scarcity of labeled polymer data and the complex hierarchical structure of synthetic polymers make generative design particularly challenging. We advance the current state-of-the-art approaches to generate not only repeating units, but monomer ensembles including their stoichiometry and chain architecture. We build upon a recent polymer representation that includes stoichiometries and chain architectures of monomer ensembles and develop a novel variational autoencoder (VAE) architecture encoding a graph and decoding a string. Using a semi-supervised setup, we enable the handling of partly labelled datasets which can be benefitial for domains with a small corpus of labelled data. Our model learns a continuous, well organized latent space (LS) that enables de-novo generation of copolymer structures including different monomer stoichiometries and chain architectures. In an inverse design case study, we demonstrate our model for in-silico discovery of novel conjugated copolymer photocatalysts for hydrogen production using optimization of the polymer's electron affinity and ionization potential in the latent space.

Inverse Design of Copolymers Including Stoichiometry and Chain Architecture

TL;DR

This work builds upon a recent polymer representation that includes stoichiometries and chain architectures of monomer ensembles and develops a novel variational autoencoder (VAE) architecture encoding a graph and decoding a string to enable the handling of partly labelled datasets.

Abstract

The demand for innovative synthetic polymers with improved properties is high, but their structural complexity and vast design space hinder rapid discovery. Machine learning-guided molecular design is a promising approach to accelerate polymer discovery. However, the scarcity of labeled polymer data and the complex hierarchical structure of synthetic polymers make generative design particularly challenging. We advance the current state-of-the-art approaches to generate not only repeating units, but monomer ensembles including their stoichiometry and chain architecture. We build upon a recent polymer representation that includes stoichiometries and chain architectures of monomer ensembles and develop a novel variational autoencoder (VAE) architecture encoding a graph and decoding a string. Using a semi-supervised setup, we enable the handling of partly labelled datasets which can be benefitial for domains with a small corpus of labelled data. Our model learns a continuous, well organized latent space (LS) that enables de-novo generation of copolymer structures including different monomer stoichiometries and chain architectures. In an inverse design case study, we demonstrate our model for in-silico discovery of novel conjugated copolymer photocatalysts for hydrogen production using optimization of the polymer's electron affinity and ionization potential in the latent space.
Paper Structure (40 sections, 8 equations, 20 figures, 2 tables)

This paper contains 40 sections, 8 equations, 20 figures, 2 tables.

Figures (20)

  • Figure 1: Two aspects of the complexity of synthetic polymer structures that need to be considered in the design due to their impact on the polymer properties. (A) Polymers possess a hierarchical structure from monomer structures, their composition (homopolymer, copolymer, etc.) and stoichiometry to chain architecture and linking structure. (B) Polymers are often stochastic materials composed of macromolecules of different lengths and weights.
  • Figure 2: Polymer representations accounting for stoichiometry of monomer ensembles, the chain architecture and the stochastic nature of polymers. The graph representation is adopted from aldeghi2022graph, with stochastic edges (dashed) reflecting the connection propbabilities between monomers, i.e. reflecting the chain archtiecture. The string representation is a text-based description of the polymer graph representation, concatenating monomer SMILES, stoichiometry and connection probabilities.
  • Figure 3: Semi-supervised Graph-2-string VAE architecture. The polymers are represented as graphs and encoded in a wD-MPNN (weighted directed message passing neural network) to obtain mean $\mu$ and variance $\sigma^2$ tensors. The latent representation $\boldsymbol{z}$ is sampled from a normal distribution parametrized by $\mu$ and $\sigma$, using the reparametrization trick. The latent representation $\boldsymbol{z}$ is fed both to a feed forward neural network to predict polymer properties (for labelled data) and to the Transformer decoder to reconstruct the polymer in string format.
  • Figure 4: Copolymer photocatalyst dataset from aldeghi2022graph that is used in this work. The polymer space consists of 9 A-monomers and 682 B-monomers that are combined in three stoichiometries (1:1, 1:3, 3:1) and three chain architectures (alternating, block, random). This forms a dataset of 42966 copolymers, including DFT-calculated polymer properties ionization potential (IP) and electron affinity (EA). We create an augmented data set without the property labels (ca. 3 times the size) by allowing the combination of B-B copolymers.
  • Figure 5: 56 example copolymers sampled from Gaussian noise. All sampled polymers belong to the class of conjugated copolymers as the training data, displaying a wide variety of monomer structures combined in different chain architectures and stoichiometries.
  • ...and 15 more figures