A critical assessment of bonding descriptors for predicting materials properties
Aakash Ashok Naik, Nidal Dhamrait, Katharina Ueltzen, Christina Ertural, Philipp Benner, Gian-Marco Rignanese, Janine George
TL;DR
This work critically evaluates quantum-chemical bonding descriptors as features for predicting solid-state material properties, expanding a high-throughput bonding database to roughly 13,000 materials and deriving new descriptors from LOBSTER outputs. By applying all-relevant feature selection, distance-correlation analyses, SHAP, and corrected resampling t-tests across Random Forest and MODNet models, the study shows that bonding descriptors provide complementary, target-dependent predictive value, especially for bond-stiffness, lattice thermal conductivity, and elasticity-related properties, while offering limited gains for global thermodynamic quantities. Symbolic regression with SISSO yields interpretable relationships linking bonding descriptors to key targets (e.g., max_pfc correlating with the strongest bond length ratio; log_klat_300 tied to bonding heterogeneity and volume per atom), highlighting physically intuitive connections. The findings suggest pathways to faster surrogate models and bonding-informed representations (including GNNs) for efficient materials discovery, while indicating that the most gain is realized in directional/local properties rather than averaged thermodynamic quantities.
Abstract
Most machine learning models for materials science rely on descriptors based on materials compositions and structures, even though the chemical bond has been proven to be a valuable concept for predicting materials properties. Over the years, various theoretical frameworks have been developed to characterize bonding in solid-state materials. However, integrating bonding information from these frameworks into machine learning pipelines at scale has been limited by the lack of a systematically generated and validated database. Recent advances in high-throughput bonding analysis workflows have addressed this issue, and our previously computed Quantum-Chemical Bonding Database for Solid-State Materials was extended to include approximately 13,000 materials. This database is then used to derive a new set of quantum-chemical bonding descriptors. A systematic assessment is performed using statistical significance tests to evaluate how the inclusion of these descriptors influences the performance of machine-learning models that otherwise rely solely on structure- and composition-derived features. Models are built to predict elastic, vibrational, and thermodynamic properties typically associated with chemical bonding in materials. The results demonstrate that incorporating quantum-chemical bonding descriptors not only improves predictive performance but also helps identify intuitive expressions for properties such as the projected force constant and lattice thermal conductivity via symbolic regression.
