Table of Contents
Fetching ...

Active learning potentials for first-principles phase diagrams using replica-exchange nested sampling

Nico Unglert, Michael Ketter, Georg K. H. Madsen

TL;DR

This work presents a fully automated framework that couples replica-exchange nested sampling (RENS) with active learning to generate representative training data and compute complete pressure–temperature phase diagrams from first principles. By leveraging a committee-based uncertainty measure and batch selection from RENS trajectories, the approach builds transferable neural-interatomic potentials at the r2SCAN level and maps phase boundaries for Si, Ge, and Ti across broad thermodynamic ranges. The results reproduce key phase boundaries and trends while highlighting finite-size and functional limitations, demonstrating an autonomous route to first-principles phase-diagram prediction. Overall, the method offers a scalable path toward automatic construction of accurate interatomic potentials and comprehensive phase diagrams without manual dataset curation.

Abstract

Accurate prediction of materials phase diagrams from first principles remains a central challenge in computational materials science. Machine-learning interatomic potentials can provide near-DFT accuracy at a fraction of the cost, but their reliability crucially depends on the availability of representative training data that span all relevant regions of the potential-energy surface. Here, we present a fully automated active-learning (AL) strategy based on replica-exchange nested sampling (RENS) for the generation of training data and the computation of complete pressure-temperature phase diagrams. In our framework, RENS acts as both the exploration engine and the acquisition mechanism: its intrinsic diversity and likelihood-constrained sampling ensure that the configurations selected for DFT labeling are both informative and thermodynamically representative. We apply the approach to silicon, germanium, and titanium using potentials trained at the r2SCAN level of theory. For all systems, the AL process converges within 10-15 iterations, yielding transferable potentials that reproduce known phase transitions and thermodynamic trends. These results demonstrate that RENS-based AL provides a general and autonomous route to constructing machine-learning interatomic potentials and predicting first-principles phase diagrams across broad thermodynamic conditions.

Active learning potentials for first-principles phase diagrams using replica-exchange nested sampling

TL;DR

This work presents a fully automated framework that couples replica-exchange nested sampling (RENS) with active learning to generate representative training data and compute complete pressure–temperature phase diagrams from first principles. By leveraging a committee-based uncertainty measure and batch selection from RENS trajectories, the approach builds transferable neural-interatomic potentials at the r2SCAN level and maps phase boundaries for Si, Ge, and Ti across broad thermodynamic ranges. The results reproduce key phase boundaries and trends while highlighting finite-size and functional limitations, demonstrating an autonomous route to first-principles phase-diagram prediction. Overall, the method offers a scalable path toward automatic construction of accurate interatomic potentials and comprehensive phase diagrams without manual dataset curation.

Abstract

Accurate prediction of materials phase diagrams from first principles remains a central challenge in computational materials science. Machine-learning interatomic potentials can provide near-DFT accuracy at a fraction of the cost, but their reliability crucially depends on the availability of representative training data that span all relevant regions of the potential-energy surface. Here, we present a fully automated active-learning (AL) strategy based on replica-exchange nested sampling (RENS) for the generation of training data and the computation of complete pressure-temperature phase diagrams. In our framework, RENS acts as both the exploration engine and the acquisition mechanism: its intrinsic diversity and likelihood-constrained sampling ensure that the configurations selected for DFT labeling are both informative and thermodynamically representative. We apply the approach to silicon, germanium, and titanium using potentials trained at the r2SCAN level of theory. For all systems, the AL process converges within 10-15 iterations, yielding transferable potentials that reproduce known phase transitions and thermodynamic trends. These results demonstrate that RENS-based AL provides a general and autonomous route to constructing machine-learning interatomic potentials and predicting first-principles phase diagrams across broad thermodynamic conditions.

Paper Structure

This paper contains 18 sections, 3 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: a) Schematic plot of a NS simulation, indicating the monotonically decreasing energy of the NS samples as well as a sequence of regularly spaced samples ${R_i}$ as green dots together with the hypershells in configuration space that can be assigned to them. b) Schematic plot of a $M=4$ RENS simulation, indicating the sample trajectories as black lines and the $N_\mathrm{pre}$ sliced out samples as green circles. Middle panel shows the latter colored according to their uncertainty $\sigma_F$ and bottom panel shows remaining batch of $N_\mathrm{post}$ samples after uncertainty subsampling.
  • Figure 2: Comparison of atomic environments for all configurations contained in the GAP-18 silicon database (grey points) and the configurations used as initial database for the silicon AL run (red points). All configurations collected by the AL run are superimposed as blue points.
  • Figure 3: Schedule and metrics of a silicon AL run. a) Schedule imposed on the $\mathrm{Si}$ sampling procedure, varying the number of walkers $K$ and the number of replicas $M$ together with an estimated number of energy evaluations each RENS simulation consumed. b) and c) Force and energy uncertainties computed for all NS sample trajectories of AL iteration $i$. d) Energy error on the GAP-18 silicon database for the model used for the RENS simulation in iteration $i$. The empty grey circle and dashed grey line indicate the performance of the model including the samples from the last AL iteration $i=11$. Bottom panels show the distribution of the respective quantity, top panels the average values for each AL iteration.
  • Figure 4: Monitoring of key quantities at different iterations $i$ of the AL strategy for silicon. a) Normed force uncertainties of the NS sample trajectories (considering every 10th sample). b) NS expectation values of the constant pressure heatcapacity. c) The mean of a Steinhardt bond order parameter $\overline{Q_4}$ order parameter. d) Distribution of space groups of the $N_\mathrm{post}=100$ AL samples per iteration. Blue and red numbers indicate values for cut off bars. Normalized force uncertainty and $\overline{Q_4}$ are dimensionless and $C_P$ is given in units of $10^{-3} \, \mathrm{eV}\, \mathrm{K}^{-1} \, \mathrm{atom}^{-1}$.
  • Figure 5: Distribution and averaged value of force and energy uncertainties for the Ge AL run. a) Force uncertainty over all NS sample trajectories. b) Energy uncertainty over all NS sample trajectories. Bottom panels show the distribution of the respective quantity, top panels the average values for each AL iteration.
  • ...and 5 more figures