Table of Contents
Fetching ...

Chalcogen Impurity Barriers in 2D Systems via Semi-Empirical/Machine Learning Modeling: A Survey over 4000 Materials

M. L. Pereira Junior, M. G. E. da Luz, P. Cesana, A. L. da Rosa, M. J. Piotrowski, D. Guedes-Sobrinho, T. A. S. Pereira, E. A. Moujaes, A. C. Dias, R. M. Tromer

TL;DR

A data-driven approach which integrates the semi-empirical Extended Huckel Method with machine learning techniques to estimate adsorption energy barriers in the case of three relevant chalcogen impurities, showing that when combined with interpretable ML protocols, EHM can produce a scalable framework for choosing 2D structures that exhibit the desired capture/release dynamics pertinent in a variety of utilization.

Abstract

Adequate characterization of two-dimensional materials with low energy barriers for impurity adsorption is key for advancing applications based on catalysis, sensing, and surface functionalization. However, first-principles methods, such as DFT, are often computationally extremely expensive for feasible large-scale screenings. Given such a scenario, we address a data-driven approach which integrates the semi-empirical Extended Huckel Method with machine learning techniques to estimate adsorption energy barriers in the case of three relevant chalcogen impurities, S, Se and Te. With this aim, we consider the 4036 2D materials found in the C2DB. The scheme employs the EHM to compute energy profiles along three in-plane migration paths, from which average barriers can be derived. The equilibrium distance between the impurity and the 2D surface is not calculated from a tie-consuming geometry optimization. Instead, it is estimated from a simple effective phenomenological expression. Physicochemical descriptors are then obtained from the Matminer library for curated features. Four different ML models are tested, with the XGBoost leading to the highest performance. We further use SHAP to verify the resulting predictions, focusing on the $\sim1,500$ materials displaying the lowest barrier values. As it could be anticipated, we establish that the average valence electron count, electronegativity, and atomic number are typically the most relevant attributes to validate the ML model. But we also are able to determine, for the different chalcogen atoms, which other few descriptors likewise considerably influence the adsorption properties. Our results show that when combined with interpretable ML protocols, EHM can produce a scalable framework for choosing 2D structures that exhibit the desired capture/release dynamics pertinent in a variety of utilization.

Chalcogen Impurity Barriers in 2D Systems via Semi-Empirical/Machine Learning Modeling: A Survey over 4000 Materials

TL;DR

A data-driven approach which integrates the semi-empirical Extended Huckel Method with machine learning techniques to estimate adsorption energy barriers in the case of three relevant chalcogen impurities, showing that when combined with interpretable ML protocols, EHM can produce a scalable framework for choosing 2D structures that exhibit the desired capture/release dynamics pertinent in a variety of utilization.

Abstract

Adequate characterization of two-dimensional materials with low energy barriers for impurity adsorption is key for advancing applications based on catalysis, sensing, and surface functionalization. However, first-principles methods, such as DFT, are often computationally extremely expensive for feasible large-scale screenings. Given such a scenario, we address a data-driven approach which integrates the semi-empirical Extended Huckel Method with machine learning techniques to estimate adsorption energy barriers in the case of three relevant chalcogen impurities, S, Se and Te. With this aim, we consider the 4036 2D materials found in the C2DB. The scheme employs the EHM to compute energy profiles along three in-plane migration paths, from which average barriers can be derived. The equilibrium distance between the impurity and the 2D surface is not calculated from a tie-consuming geometry optimization. Instead, it is estimated from a simple effective phenomenological expression. Physicochemical descriptors are then obtained from the Matminer library for curated features. Four different ML models are tested, with the XGBoost leading to the highest performance. We further use SHAP to verify the resulting predictions, focusing on the materials displaying the lowest barrier values. As it could be anticipated, we establish that the average valence electron count, electronegativity, and atomic number are typically the most relevant attributes to validate the ML model. But we also are able to determine, for the different chalcogen atoms, which other few descriptors likewise considerably influence the adsorption properties. Our results show that when combined with interpretable ML protocols, EHM can produce a scalable framework for choosing 2D structures that exhibit the desired capture/release dynamics pertinent in a variety of utilization.
Paper Structure (15 sections, 2 equations, 9 figures)

This paper contains 15 sections, 2 equations, 9 figures.

Figures (9)

  • Figure 1: Workflow summarizing all the data gathering, calculations, ML models and interpretative analyzes implemented to properly characterize the adsorption of chalcogen atoms in 2D materials classified in the C2DB database (comprising 4000+ distinct systems).
  • Figure 2: Schematics of the problem geometry. Distinct 2D material (from the C2DB database) constituting the substrate for a chalcogen atom (either S, Se or Te) adsorption. The effective equilibrium distance $d_{eq}$, used to calculate the energy barriers, is estimated from the phenomenological considerations in Sec. \ref{['d-heuristic']}.
  • Figure 3: Energy barrier profiles (in eV) obtained for the dichalcogenide elements: (a) S, (b) Se, and (c) Te, adsorbed on graphene using the EHM via the YAeHMOP software. In each case, the impurity was displaced along three directions over the graphene surface: $x$ (blue), $y$ (orange), and the diagonal $xy$ (green).
  • Figure 4: Histograms of average energy barriers for about 3150 2D systems from the C2DB database, for which the energies do not exceed 5.0 eV. The impurities are (a) S, (b) Se and (c) Te. The values represent the average over the three directions $x$, $y$, and $xy$.
  • Figure 5: Comparison between the calculated and ML-predicted energy barrier values ($\leq 2.0$ eV) for the S impurity adsorbed on materials of the C2DB database. The ML models are: Linear Regression (first row), Neural Network (second row), Decision Tree (third row), and XGBoost (fourth row). Each model training stage is shown on the left and the testing stage on the right. The dashed red line represents the ideal $y=x$ identity line. Performance metrics, namely, Mean Absolute Error, Mean Squared Error, and the coefficient of determination $R^2$, are also shown.
  • ...and 4 more figures