Data-efficient and Interpretable Inverse Materials Design using a Disentangled Variational Autoencoder

Cheng Zeng; Zulqarnain Khan; Nathan L. Post

Data-efficient and Interpretable Inverse Materials Design using a Disentangled Variational Autoencoder

Cheng Zeng, Zulqarnain Khan, Nathan L. Post

TL;DR

This work tackles the challenge of entangled latent representations in inverse materials design by introducing a semi-supervised disentangled variational autoencoder (DVAE) that learns a probabilistic mapping between material features, latent factors, and the target property. By incorporating expert-informed priors and a physics-informed feature transform, the model disentangles the target (single-phase formation) from other factors, while leveraging both labelled and unlabelled data for data efficiency. Empirical results on high-entropy alloys show strong predictive performance (test AUC up to ~0.89) and reliable reconstruction (MAE ~2–6%), with a latent space that captures meaningful structure and associations with element count and grouping. The framework supports three inverse-design workflows—high-throughput screening, latent-space design, and iterative inversion—plus post-hoc interpretability via SHAP, offering a practical, interpretable path to multi-property materials design with potential extensions to other material representations and targets.

Abstract

Inverse materials design has proven successful in accelerating novel material discovery. Many inverse materials design methods use unsupervised learning where a latent space is learned to offer a compact description of materials representations. A latent space learned this way is likely to be entangled, in terms of the target property and other properties of the materials. This makes the inverse design process ambiguous. Here, we present a semi-supervised learning approach based on a disentangled variational autoencoder to learn a probabilistic relationship between features, latent variables and target properties. This approach is data efficient because it combines all labelled and unlabelled data in a coherent manner, and it uses expert-informed prior distributions to improve model robustness even with limited labelled data. It is in essence interpretable, as the learnable target property is disentangled out of the other properties of the materials, and an extra layer of interpretability can be provided by a post-hoc analysis of the classification head of the model. We demonstrate this new approach on an experimental high-entropy alloy dataset with chemical compositions as input and single-phase formation as the single target property. High-entropy alloys were chosen as example materials because of the vast chemical space of their possible combinations of compositions and atomic configurations. While single property is used in this work, the disentangled model can be extended to customize for inverse design of materials with multiple target properties.

Data-efficient and Interpretable Inverse Materials Design using a Disentangled Variational Autoencoder

TL;DR

Abstract

Paper Structure (19 sections, 6 equations, 8 figures, 3 tables)

This paper contains 19 sections, 6 equations, 8 figures, 3 tables.

Introduction
Methods & Theories
Dataset
Proposed Model: Disentangled VAE
Generative model
Recognition model
Model training
Post-hoc analysis: SHAP feature importance
Results and Discussion
Classification and Reconstruction Performance
Classification for single phase formation
Alloy reconstruction
Data efficiency
Latent space representation
Disentanglement from target property
...and 4 more sections

Figures (8)

Figure 1: Experimental high-entropy alloy dataset for single phase formation.
Figure 2: Generative (Left) and recognition model (Right) in the disentangled variational autoencoder for inverse design of single-phase high-entropy alloys.
Figure 3: ROC curves for training, validation and test datasets.
Figure 4: Comparison between original alloys and reconstructed alloys across 138 test data points: (a) Composition vectors, (b) Predicted single-phase probability, and (c) Latent variables.
Figure 5: Disentangled latent representation: (a) Data colored by the target property, and (b) Data distribution shown as a kernel density plot. True labels for the HEAs are given as a binary presentation where '1' and '0' stand for single phase and multiple phase structures respectively.
...and 3 more figures

Data-efficient and Interpretable Inverse Materials Design using a Disentangled Variational Autoencoder

TL;DR

Abstract

Data-efficient and Interpretable Inverse Materials Design using a Disentangled Variational Autoencoder

Authors

TL;DR

Abstract

Table of Contents

Figures (8)