Data-Efficient Machine learning for Predicting Dopant Formation Energies in TiO$_2$ Monolayer
Kati Asikainen, Matti Alatalo, Marko Huttula, Assa Aravindh Sasikala Devi
TL;DR
This work addresses data scarcity in predicting dopant formation energies for a 2D TiO2 monolayer by integrating DFT with a compact, descriptor-based ML framework. The approach builds a small, physics-informed dataset for Pt-doped configurations and tests chemical transferability to Ag-doped systems, achieving high predictive accuracy for Pt (R^2 up to ~0.9) and enabling transfer with limited Ag data. Key insights include the dominance of the CN_4Å_mean descriptor and the robustness of Pt predictions when additional but targeted dopant data are incorporated. Overall, the framework enables data-efficient, transferable screening of dopants in doped TiO2 monolayers, guiding design while minimizing computational cost.
Abstract
Machine learning models are increasingly applied in materials science, yet their predictive power is often constrained by data scarcity. Here, we show that accurate predictions can be achieved, even with a limited number of training examples, provided the dataset is compact and and grounded in physically relevant quantities. By combining density functional theory calculations with a machine-learning framework, we construct accurate descriptor-based models to predict the formation energies of doped lepidocrocite TiO$_2$ monolayers. The predictive accuracy of machine-learning models was first evaluated for single-dopant Pt configurations, demonstrating that the selected structural and chemical descriptors reliably capture the key factors governing dopant stability. Chemical transferability is then examined by extending the dataset to include Ag-doped configurations. Predictive accuracy improved systematically as additional Ag-doped data points were included in the training, while the performance of Pt remains robust. These results highlight the potential of small and well-curated datasets combined with physically informed descriptors to enable not only accurate but also chemically transferable machine-learning-driven screening in doped TiO$_2$ monolayer.
