Table of Contents
Fetching ...

VAE for Modified 1-Hot Generative Materials Modeling, A Step Towards Inverse Material Design

Khalid El-Awady

TL;DR

The paper tackles inverse materials design by enforcing material viability through a modified $1$-hot representation that preserves decomposition. It develops a variational autoencoder with latent dimension $n=10$ trained on the Materials Project dataset of length-$89$ vectors, using thresholding at $T=0.04$ and post-processing to produce discrete formulas, guided by the negative ELBO objective. Results show the latent space largely preserves decomposition properties (high cosine similarity between observed and reconstructed component vectors) and that the generated materials match the data's element prevalence with modest KL divergence (≈$0.08$). This approach enables sequential inverse design by enabling RL policies that operate in a latent space where compositional changes map to linear latent manipulations, potentially improving viability constraints during material discovery.

Abstract

We investigate the construction of generative models capable of encoding physical constraints that can be hard to express explicitly. For the problem of inverse material design, where one seeks to design a material with a prescribed set of properties, a significant challenge is ensuring synthetic viability of a proposed new material. We encode an implicit dataset relationships, namely that certain materials can be decomposed into other ones in the dataset, and present a VAE model capable of preserving this property in the latent space and generating new samples with the same. This is particularly useful in sequential inverse material design, an emergent research area that seeks to design a material with specific properties by sequentially adding (or removing) elements using policies trained through deep reinforcement learning.

VAE for Modified 1-Hot Generative Materials Modeling, A Step Towards Inverse Material Design

TL;DR

The paper tackles inverse materials design by enforcing material viability through a modified -hot representation that preserves decomposition. It develops a variational autoencoder with latent dimension trained on the Materials Project dataset of length- vectors, using thresholding at and post-processing to produce discrete formulas, guided by the negative ELBO objective. Results show the latent space largely preserves decomposition properties (high cosine similarity between observed and reconstructed component vectors) and that the generated materials match the data's element prevalence with modest KL divergence (≈). This approach enables sequential inverse design by enabling RL policies that operate in a latent space where compositional changes map to linear latent manipulations, potentially improving viability constraints during material discovery.

Abstract

We investigate the construction of generative models capable of encoding physical constraints that can be hard to express explicitly. For the problem of inverse material design, where one seeks to design a material with a prescribed set of properties, a significant challenge is ensuring synthetic viability of a proposed new material. We encode an implicit dataset relationships, namely that certain materials can be decomposed into other ones in the dataset, and present a VAE model capable of preserving this property in the latent space and generating new samples with the same. This is particularly useful in sequential inverse material design, an emergent research area that seeks to design a material with specific properties by sequentially adding (or removing) elements using policies trained through deep reinforcement learning.
Paper Structure (7 sections, 2 equations, 8 figures)

This paper contains 7 sections, 2 equations, 8 figures.

Figures (8)

  • Figure 1: Examples of the SMILES representation of a material.
  • Figure 2: Schematic representation of the modified 1-hot vector representation for $\hbox{Ga}(\hbox{Mo}\hbox{S}_2)_4$.
  • Figure 3: Example material entry in the Materials Project database: $\hbox{Ga}(\hbox{Mo}\hbox{S}_2)_4$.
  • Figure 4: Model architecture.
  • Figure 5: Impact of the size of the hidden layer in the VAE on the model performance. We choose 100 hidden nodes as the 'optimal' value.
  • ...and 3 more figures