Table of Contents
Fetching ...

PyAPX: Python toolkit for atomic configuration pattern exploration

Akira Kusaba, Tetsuji Kuboyama, Karol Kawka, Pawel Kempisty, Yoshihiro Kangawa

TL;DR

PyAPX addresses the problem that atomic configurations at fixed composition and lattice can strongly influence material properties. It introduces NA and NAmod encodings for configuration representations and leverages Bayesian optimization (via PHYSBO) to guide DFT energy evaluations toward stable configurations. The h-BCN demonstration shows NAmod outperforms one-hot and NA encodings, with PCA-NAmod offering dimensionality reduction without loss of performance and enabling identification of multiple symmetry-equivalent stable patterns. The toolkit promises broad applicability to crystalline materials and aims to advance materials discovery by enabling efficient configuration-space exploration.

Abstract

In materials discovery, the integration of first-principles calculations with machine learning techniques has been actively studied for two key tasks: crystal structure prediction, which searches for stable structures given a chemical composition, and elemental substitution, which explores chemical compositions that yield desirable properties in a given crystal structure. However, even when both the crystal structure and chemical composition are fixed, material properties can still vary depending on the atomic arrangements (configurations) at crystallographic sites. To support detailed material design, we present PyAPX, a Python toolkit that performs Bayesian searches of stable atomic configurations. A distinctive feature of this initial release is the introduction of encoding methods suitable for configuration search, and we evaluate their performance using the h-BCN system. As a result, they were confirmed to yield superior convergence compared to commonly used one-hot encoding. PyAPX is broadly applicable to crystalline materials and is expected to further advance materials discovery.

PyAPX: Python toolkit for atomic configuration pattern exploration

TL;DR

PyAPX addresses the problem that atomic configurations at fixed composition and lattice can strongly influence material properties. It introduces NA and NAmod encodings for configuration representations and leverages Bayesian optimization (via PHYSBO) to guide DFT energy evaluations toward stable configurations. The h-BCN demonstration shows NAmod outperforms one-hot and NA encodings, with PCA-NAmod offering dimensionality reduction without loss of performance and enabling identification of multiple symmetry-equivalent stable patterns. The toolkit promises broad applicability to crystalline materials and aims to advance materials discovery by enabling efficient configuration-space exploration.

Abstract

In materials discovery, the integration of first-principles calculations with machine learning techniques has been actively studied for two key tasks: crystal structure prediction, which searches for stable structures given a chemical composition, and elemental substitution, which explores chemical compositions that yield desirable properties in a given crystal structure. However, even when both the crystal structure and chemical composition are fixed, material properties can still vary depending on the atomic arrangements (configurations) at crystallographic sites. To support detailed material design, we present PyAPX, a Python toolkit that performs Bayesian searches of stable atomic configurations. A distinctive feature of this initial release is the introduction of encoding methods suitable for configuration search, and we evaluate their performance using the h-BCN system. As a result, they were confirmed to yield superior convergence compared to commonly used one-hot encoding. PyAPX is broadly applicable to crystalline materials and is expected to further advance materials discovery.

Paper Structure

This paper contains 7 sections, 3 equations, 5 figures.

Figures (5)

  • Figure 1: (a, b) Typical problem settings in materials discovery based on first-principles calculations (and machine learning), and (c) the problem setting focused on in this study. Input refers to the design variables that are fixed in the discovery system, while Output denotes those to be optimized.
  • Figure 2: Scheme of the PyAPX workflow. Items in black boxes are prepared by the user. The DFT input files are automatically generated within the sequential loop of sampling and DFT calculations, which continues until the specified number of sampling iterations is reached.
  • Figure 3: Schematic illustration of (a) the neighbor-atom (NA) encoding and (b) the modified neighbor-atom (NAmod) encoding, using a honeycomb lattice system composed of three elements, Z1, Z2, and Z3, as an example. Green, brown, and purple circles represent Z1, Z2, and Z3 atoms, respectively, and the multicolored double circles indicate the convolution of atomic occupancies at neighboring sites.
  • Figure 4: An example of an atomic configuration pattern in the (3$\times$3) periodic h-BCN system. This system contains 18 sites, and the numbers in the figure indicate their site indices. Six B atoms, six C atoms, and six N atoms, represented by green, brown, and blue spheres, respectively, occupy the sites. The crystal structure was visualized using the VESTA momma2011vesta.
  • Figure 5: Sampling histories obtained by Bayesian optimization for each encoding case: (a) one-hot, (b) NA, (c) modified NA, and (d) modified NA with PCA. The legends are shared among panels (a)--(d), and the red circle markers correspond to the most stable configurations. Statistical plots for comparing their optimization performance: (e) cumulative minimum, (f) moving average, and (g)--(h) moving standard deviation. The legends are shared among panels (e)--(h), and the window size for the moving averages and moving standard deviations was set to 51 samples.