Table of Contents
Fetching ...

Physics-informed Hamiltonian learning for large-scale optoelectronic property prediction

Martin Schwade, Shaoming Zhang, Frederik Vonhoff, Frederico P. Delgado, David A. Egger

TL;DR

This work presents HAMSTER, a physics-informed machine learning framework for predicting the quantum-mechanical Hamiltonian of complex chemical systems, and demonstrates the power of physics-informed Hamiltonian learning for accurate and interpretable optoelectronic property prediction in large, complex systems.

Abstract

Predicting optoelectronic properties of large-scale atomistic systems under realistic conditions is crucial for rational materials design, yet computationally prohibitive with first-principles simulations. Recent neural network models have shown promise in overcoming these challenges, but typically require large datasets and lack physical interpretability. Physics-inspired approximate models offer greater data efficiency and intuitive understanding, but often sacrifice accuracy and transferability. Here we present HAMSTER, a physics-informed machine learning framework for predicting the quantum-mechanical Hamiltonian of complex chemical systems. Starting from an approximate model encoding essential physical effects, HAMSTER captures the critical influence of dynamic environments on Hamiltonians using only few explicit first-principles calculations. We demonstrate our approach on halide perovskites, achieving accurate prediction of optoelectronic properties across temperature and compositional variations, and scalability to systems containing tens of thousands of atoms. This work highlights the power of physics-informed Hamiltonian learning for accurate and interpretable optoelectronic property prediction in large, complex systems.

Physics-informed Hamiltonian learning for large-scale optoelectronic property prediction

TL;DR

This work presents HAMSTER, a physics-informed machine learning framework for predicting the quantum-mechanical Hamiltonian of complex chemical systems, and demonstrates the power of physics-informed Hamiltonian learning for accurate and interpretable optoelectronic property prediction in large, complex systems.

Abstract

Predicting optoelectronic properties of large-scale atomistic systems under realistic conditions is crucial for rational materials design, yet computationally prohibitive with first-principles simulations. Recent neural network models have shown promise in overcoming these challenges, but typically require large datasets and lack physical interpretability. Physics-inspired approximate models offer greater data efficiency and intuitive understanding, but often sacrifice accuracy and transferability. Here we present HAMSTER, a physics-informed machine learning framework for predicting the quantum-mechanical Hamiltonian of complex chemical systems. Starting from an approximate model encoding essential physical effects, HAMSTER captures the critical influence of dynamic environments on Hamiltonians using only few explicit first-principles calculations. We demonstrate our approach on halide perovskites, achieving accurate prediction of optoelectronic properties across temperature and compositional variations, and scalability to systems containing tens of thousands of atoms. This work highlights the power of physics-informed Hamiltonian learning for accurate and interpretable optoelectronic property prediction in large, complex systems.

Paper Structure

This paper contains 21 sections, 11 equations, 4 figures.

Figures (4)

  • Figure 1: Overview of the physics-informed Hamiltonian learning model and workflow with results for GaAs.a Illustration of pairwise and environment-dependent electronic interactions arising from structural fluctuations in a chemical system and their effect on the electronic structure captured via an effective Hamiltonian, $\hat{H}_\mathrm{eff}$, in the Hamster approach. It combines a physical model and environment-dependent machine learning terms. b Schematic visualization of the environment descriptor for the matrix element between atoms $i$ and $j$, which treats local environments of the two atoms separately. Different atomic species are indicated by purple and gray colors. Atoms within a distance $r_\mathrm{cut}$, which is indicated by dashed circles, are labeled as $k_x$, with $x=1,2,3,..$. The $s$ and $p$ orbitals of selected atoms are shown schematically as red circles and red-blue handles, respectively. c Validation error, calculated as mean absolute error (MAE) across all eigenvalues, for a Hamster model trained on an increasing number of training structures for GaAs at 400K. d Comparison of residuals (difference between model and DFT eigenvalues) of a pristine tight binding (orange) and the Hamster model (blue) with respect to density functional theory (DFT) data, averaged over $\textbf{k}$-points and 100 snapshots, at 400K. Dashed vertical lines indicate the valence band maximum (VBM).
  • Figure 2: Training and descriptor analysis for CsPbBr$_3$. a Dependence of training (dark orange) and validation (gray) loss (MAE between DFT and Hamster energy eigenvalues) on the number of training structures. The data point at zero training structures corresponds to the underlying TB model without ML corrections. b Principal component analysis of the kernel support points for 12 training structures. In both panels, we compare two different cutoff radii ($r_\mathrm{cut}$) for the interaction range, namely 5.5Å and 6.2Å.
  • Figure 3: Transferability of the model across temperatures and large-scale calculations. a Band gap of CsPbBr$_3$ computed with DFT (black) and Hamster (blue) for a $2\times2\times2$ supercell of CsPbBr$_3$. Results obtained using Hamster for a $16\times16\times16$ supercell are shown as well (green). Thin vertical lines indicate the standard deviations. The gray line represents the slope of the fitting function derived from experimental data mannino_temperaturedependent_2020. b Comparison of residuals of a pristine TB (orange) and the Hamster model (blue) with respect to DFT data, averaged over $k$-points and 100 snapshots, for CsPbBr$_3$ at 425K.
  • Figure 4: Band gap of MAPbBr$_3$ as a function of supercell size and temperature, with associated computational scaling.a Band gap computed with Hamster (blue) and DFT (black) for varying supercell sizes at a temperature of 300K. The $x$-axis indicates the supercell dimension $n$, corresponding to an $n\times n \times n$ replication of the cubic cell. b Temperature-induced change in band gap, $\Delta E_\mathrm{g}$, references to the band-gap value at a temperature of 300K, computed for varying supercell sizes (green: $4\times4\times4$; dark orange: $16\times16\times16$) with Hamster and compared to an experimentally-determined fit function (gray) mannino_temperaturedependent_2020. c Runtime of Hamster evaluated on 10 structures as a function of cell size. Timings are shown for full diagonalization of the Hamiltonian (full diag; green stars), partial diagonalization of six eigenvalues around the VBM using the Lanczos algorithm (partial diag; orange diamonds), and construction and saving of the Hamiltonian without diagonalization (no diag; blue circles). A constant Julia compilation time of 44s has been subtracted. Reference lines indicating linear and cubic scaling are shown in gray.