Table of Contents
Fetching ...

Completeness of Atomic Structure Representations

Jigyasa Nigam, Sergey N. Pozdnyakov, Kevin K. Huguenin-Dumittan, Michele Ceriotti

TL;DR

The paper addresses the challenge of obtaining a complete, symmetry-adapted local representation for atomic environments, highlighting the incompleteness of common density-based descriptors at finite body orders. It introduces a finite, triplet-based descriptor built from relative coordinates of two tagged neighbors and a nonlinear encoder, achieving $O(3)$-invariant and permutation-invariant completeness with a controllable resolution. Completeness is proven for the local neighborhood and demonstrated on a deliberately constructed set of bispectrum-degenerate $B_8$ structures, where nonlinear triplet features distinguish degenerate pairs that linear triplet or bispectrum descriptors cannot, enabling universal approximators for local properties. The work also connects to ACE/NICE and MTP frameworks, showing how a low-order nonlinear triplet representation can achieve completeness without requiring arbitrarily high-order linear expansions, with practical improvements in accuracy and stability.

Abstract

In this paper, we address the challenge of obtaining a comprehensive and symmetric representation of point particle groups, such as atoms in a molecule, which is crucial in physics and theoretical chemistry. The problem has become even more important with the widespread adoption of machine-learning techniques in science, as it underpins the capacity of models to accurately reproduce physical relationships while being consistent with fundamental symmetries and conservation laws. However, some of the descriptors that are commonly used to represent point clouds -- most notably those based on discretized correlations of the neighbor density, that underpin most of the existing ML models of matter at the atomic scale -- are unable to distinguish between special arrangements of particles in three dimensions. This makes it impossible to machine learn their properties. Atom-density correlations are provably complete in the limit in which they simultaneously describe the mutual relationship between all atoms, which is impractical. We present a novel approach to construct descriptors of \emph{finite} correlations based on the relative arrangement of particle triplets, which can be employed to create symmetry-adapted models with universal approximation capabilities, which have the resolution of the neighbor discretization as the sole convergence parameter. Our strategy is demonstrated on a class of atomic arrangements that are specifically built to defy a broad class of conventional symmetric descriptors, showcasing its potential for addressing their limitations.

Completeness of Atomic Structure Representations

TL;DR

The paper addresses the challenge of obtaining a complete, symmetry-adapted local representation for atomic environments, highlighting the incompleteness of common density-based descriptors at finite body orders. It introduces a finite, triplet-based descriptor built from relative coordinates of two tagged neighbors and a nonlinear encoder, achieving -invariant and permutation-invariant completeness with a controllable resolution. Completeness is proven for the local neighborhood and demonstrated on a deliberately constructed set of bispectrum-degenerate structures, where nonlinear triplet features distinguish degenerate pairs that linear triplet or bispectrum descriptors cannot, enabling universal approximators for local properties. The work also connects to ACE/NICE and MTP frameworks, showing how a low-order nonlinear triplet representation can achieve completeness without requiring arbitrarily high-order linear expansions, with practical improvements in accuracy and stability.

Abstract

In this paper, we address the challenge of obtaining a comprehensive and symmetric representation of point particle groups, such as atoms in a molecule, which is crucial in physics and theoretical chemistry. The problem has become even more important with the widespread adoption of machine-learning techniques in science, as it underpins the capacity of models to accurately reproduce physical relationships while being consistent with fundamental symmetries and conservation laws. However, some of the descriptors that are commonly used to represent point clouds -- most notably those based on discretized correlations of the neighbor density, that underpin most of the existing ML models of matter at the atomic scale -- are unable to distinguish between special arrangements of particles in three dimensions. This makes it impossible to machine learn their properties. Atom-density correlations are provably complete in the limit in which they simultaneously describe the mutual relationship between all atoms, which is impractical. We present a novel approach to construct descriptors of \emph{finite} correlations based on the relative arrangement of particle triplets, which can be employed to create symmetry-adapted models with universal approximation capabilities, which have the resolution of the neighbor discretization as the sole convergence parameter. Our strategy is demonstrated on a class of atomic arrangements that are specifically built to defy a broad class of conventional symmetric descriptors, showcasing its potential for addressing their limitations.
Paper Structure (12 sections, 14 equations, 4 figures, 2 tables)

This paper contains 12 sections, 14 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: (a) Representations of an atomic environment $A_i$ in terms of neighbor-atom pairs, or symmetrized three-point correlations. The two schemes are equivalent in the complete basis set limit and generalize to higher-order representations. (b) A pair of environments that are degenerate to $\nu=1$ (list-of-distances) representations.
  • Figure 2: (a) 3-center-1-neighbor features describe the position of atoms relative to two tagged atoms around the center $i$. This is equivalent to a list of tetrahedra sharing the $ii_1i_2$ triangle. (b) Tagging two atoms defines a local orthogonal coordinate system - except when the tagged atoms are collinear, and the azimuthal directions are ill-defined. (c) An encoder-decoder architecture can be used to determine non-linear environment features $\tilde{\upxi}_b$ that contain enough information to reproduce high-order representations.
  • Figure 3: (a) Correlation plot of the distances between pairs of structures from the B$_8$ dataset, computed based on the bispectrum ($d_3$, red), the intermediate encoded features from Model A$^\rho_{\text{NL}}$ ($d_{\tilde{\xi}}$, blue), and the full-body features ($d_7$), taken as fully-discriminating descriptors. Configurations are randomly distorted ($\tilde{A}^+$ in the image), to reveal the behavior in the vicinity of the singular points (a bispectrum degenerate pair $A^+$ and $A^-$ is also shown in the inset), for which the bispectrum distance would be exactly zero. (b) Predictions for a representative 250 pairs of the validation set from model B$_{\text{NL}}$(blue) and D$_{\text{NL}}$ (red). Degenerate pairs are joined by a line. The bispectrum-based predictions are identical for the pair, whereas the triplet features $\tilde{\xi}$ can resolve the difference in energy of the pair and achieve better accuracy overall.
  • Figure S4: a) Convergence of overall RMSE in the prediction of the $$ features using encoder/decoder architectures with changing hidden layer dimension (latent space sizes) considering both structures in the degenerate pairs (A$^+$, A$^-$). b) Highlights, in particular, the convergence of relative RMSE in the difference of the features of the degenerate pairs.