Learning to predict superconductivity
Omri Lesser, Yanjun Liu, Natalie Maus, Aaditya Panigrahi, Krishnanand Mallayya, Leslie M. Schoop, Jacob R. Gardner, Eun-Ah Kim
TL;DR
The paper tackles the challenge of predicting superconductivity by leveraging a data-driven featurization that fuses structural information and elemental properties through graphlet histograms and symmetry indicators derived from the 3DSC CIF database. It introduces a novel Earth Mover's Distance kernel for Gaussian-process learning on histogram-based features, with an explicit proof of kernel validity, and demonstrates high accuracy ($R^2 \approx 0.93$) for $T_c$ prediction along with uncertainty estimates. A striking finding is that a four-feature subset—dominated by the electron affinity difference between neighboring atoms—nearly saturates predictive performance, revealing a universal, chemistry-driven descriptor for $T_c$. The work also delivers a superconductivity classifier with quantified uncertainties and shows the framework's potential to rapidly screen inorganic crystals, extendable to other material properties beyond superconductivity.
Abstract
Predicting the superconducting transition temperature ($T_c$) of materials remains a major challenge in condensed matter physics due to the lack of a comprehensive and quantitative theory. We present a data-driven approach that combines chemistry-informed feature extraction with interpretable machine learning to predict $T_c$ and classify superconducting materials. We develop a systematic featurization scheme that integrates structural and elemental information through graphlet histograms and symmetry vectors. Using experimentally validated structural data from the 3DSC database, we construct a curated, featurized dataset and design a new kernel to incorporate histogram features into Gaussian-process (GP) regression and classification. This framework yields an interpretable $T_c$ predictor with an $ R^2$ value of 0.93 and a superconductor classifier with quantified uncertainties. Feature-significance analysis further reveals that GP $T_c$ predictor can achieve near-optimal performance only using four second-order graphlet features. In particular, we discovered a previously overlooked feature of electron affinity difference between neighboring atoms as a universally predictive descriptor. Our graphlet-histogram approach not only highlights bonding-related elemental descriptors as unexpectedly powerful predictors of superconductivity but also provides a broadly applicable framework for predictive modeling of diverse material properties.
