Tensor Methods: A Unified and Interpretable Approach for Material Design
Shaan Pakala, Aldair E. Gongora, Brian Giera, Evangelos E. Papalexakis
TL;DR
The paper proposes tensor completion, particularly Canonical Polyadic Decomposition (CPD), as an interpretable surrogate modeling approach for material design to navigate large design spaces and handle non-uniform sampling. It demonstrates that CPD-based models yield interpretable tensor factors that align with underlying physics and can rediscover known phenomena, while maintaining competitive predictive performance against traditional ML baselines. Across three design datasets, tensor methods show robust generalization in biased sampling scenarios and, in some cases, outperform standard models in out-of-distribution regions. The work highlights the trade-offs between interpretability and predictive power and provides empirical evidence that tensor factors can serve as a practical tool for experimentalists to identify patterns and potentially novel materials. Overall, tensor completion emerges as a practical, interpretable, and generalizable surrogate modeling paradigm for material design, with public code and clear implications for biased-sampling environments.
Abstract
When designing new materials, it is often necessary to tailor the material design (with respect to its design parameters) to have some desired properties (e.g. Young's modulus). As the set of design parameters grow, the search space grows exponentially, making the actual synthesis and evaluation of all material combinations virtually impossible. Even using traditional computational methods such as Finite Element Analysis becomes too computationally heavy to search the design space. Recent methods use machine learning (ML) surrogate models to more efficiently determine optimal material designs; unfortunately, these methods often (i) are notoriously difficult to interpret and (ii) under perform when the training data comes from a non-uniform sampling of the design space. We suggest the use of tensor completion methods as an all-in-one approach for interpretability and predictions. We observe classical tensor methods are able to compete with traditional ML in predictions, with the added benefit of their interpretable tensor factors (which are given completely for free, as a result of the prediction). In our experiments, we are able to rediscover physical phenomena via the tensor factors, indicating that our predictions are aligned with the true underlying physics of the problem. This also means these tensor factors could be used by experimentalists to identify potentially novel patterns, given we are able to rediscover existing ones. We also study the effects of both types of surrogate models when we encounter training data from a non-uniform sampling of the design space. We observe more specialized tensor methods that can give better generalization in these non-uniforms sampling scenarios. We find the best generalization comes from a tensor model, which is able to improve upon the baseline ML methods by up to 5% on aggregate $R^2$, and halve the error in some out of distribution regions.
