Table of Contents
Fetching ...

Tensor Completion for Surrogate Modeling of Material Property Prediction

Shaan Pakala, Dawon Ahn, Evangelos Papalexakis

TL;DR

The paper tackles the challenge of predicting material properties under data sparsity by recasting the problem as tensor completion. It builds tensor datasets where modes correspond to elements and their stoichiometric counts, and applies CPD-based and neural tensor completion models (e.g., CPD-PARAFAC, CPD-S, NeAT) to infer unobserved properties. Across magnetization, formation energy, and band gap tasks, tensor completion methods achieve about a 10-20% reduction in $MAE$ compared with strong non-tensor baselines while maintaining training speed. This approach enables data-efficient inference of material properties, potentially accelerating the design of new materials with desirable characteristics.

Abstract

When designing materials to optimize certain properties, there are often many possible configurations of designs that need to be explored. For example, the materials' composition of elements will affect properties such as strength or conductivity, which are necessary to know when developing new materials. Exploring all combinations of elements to find optimal materials becomes very time consuming, especially when there are more design variables. For this reason, there is growing interest in using machine learning (ML) to predict a material's properties. In this work, we model the optimization of certain material properties as a tensor completion problem, to leverage the structure of our datasets and navigate the vast number of combinations of material configurations. Across a variety of material property prediction tasks, our experiments show tensor completion methods achieving 10-20% decreased error compared with baseline ML models such as GradientBoosting and Multilayer Perceptron (MLP), while maintaining similar training speed.

Tensor Completion for Surrogate Modeling of Material Property Prediction

TL;DR

The paper tackles the challenge of predicting material properties under data sparsity by recasting the problem as tensor completion. It builds tensor datasets where modes correspond to elements and their stoichiometric counts, and applies CPD-based and neural tensor completion models (e.g., CPD-PARAFAC, CPD-S, NeAT) to infer unobserved properties. Across magnetization, formation energy, and band gap tasks, tensor completion methods achieve about a 10-20% reduction in compared with strong non-tensor baselines while maintaining training speed. This approach enables data-efficient inference of material properties, potentially accelerating the design of new materials with desirable characteristics.

Abstract

When designing materials to optimize certain properties, there are often many possible configurations of designs that need to be explored. For example, the materials' composition of elements will affect properties such as strength or conductivity, which are necessary to know when developing new materials. Exploring all combinations of elements to find optimal materials becomes very time consuming, especially when there are more design variables. For this reason, there is growing interest in using machine learning (ML) to predict a material's properties. In this work, we model the optimization of certain material properties as a tensor completion problem, to leverage the structure of our datasets and navigate the vast number of combinations of material configurations. Across a variety of material property prediction tasks, our experiments show tensor completion methods achieving 10-20% decreased error compared with baseline ML models such as GradientBoosting and Multilayer Perceptron (MLP), while maintaining similar training speed.

Paper Structure

This paper contains 12 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Example fourth order tensor representing materials' elements and elements' ratios. The highlighted entry could correspond to the chemical formula AuBr$_{5}$, since it corresponds to indices Au & Br for the elements, and indices 1 & 5 for the number of atoms in the material.
  • Figure 2: Random samples' predicted and actual values. Overall CPD-S offers reliability. Even with limited training samples in Task 4, CPD-S is still able to distinguish between low and high band gap values.
  • Figure 3: MAE and train time for models, with respect to number of train entries. This displays for a range of training sizes, CPD-S performs well without requiring more time.