Table of Contents
Fetching ...

Higher-order, generically complete, continuous, and polynomial-time isometry invariants of periodic sets

Daniel E Widdowson, Vitaliy A Kurlin

TL;DR

This work tackles the problem of distinguishing novel periodic crystals from near-duplicates under isometries by introducing a hierarchy of complete, Lipschitz-in-noise invariants. The core constructs are higher-order Pointwise Distance Distributions $\mathrm{PDD}^{\{h\}}(S;k)$, their concatenated form $\mathrm{PDD}^{(h)}$, and the 1D Pointwise Shift Distribution $\mathrm{PSD}$, augmented by moments $\mu^{(t)}[\mathrm{PDD}^{\{h\}}]$ and PDA/ADA variants; comparisons are conducted via Earth Mover's Distance $\mathrm{EMD}$ with ground metrics $L_q$ or RMS. The authors prove isometry invariance and, in 1D, completeness of PSD, with $\mathrm{PDD}^{\{2\}}$ distinguishing all known homometric counter-examples in $\mathbb{R}^3$ and provide practical polynomial-time computation for fixed $n$. They also derive asymptotic behavior for $\mathrm{PDD}^{\{h\}}(S;k)$ and establish near-linear computational complexity for small dimensions, enabling scalable processing of large crystal databases. Empirically, on ICSD, MP, and GNoME, the invariants detect thousands of near-duplicates and support fast, hierarchical novelty screening, thereby strengthening the integrity of crystallographic databases and facilitating reliable material discovery.

Abstract

Periodic point sets model all solid crystalline materials (crystals) whose atoms can be considered zero-sized points with or without atomic types. This paper addresses the fundamental problem of checking whether claimed crystals are novel, not noisy perturbations of known materials obtained by unrealistic atomic replacements. Such near-duplicates have skewed ground-truth because past comparisons relied on unstable cells and symmetries. The proposed Lipschitz continuity under noise is a new essential requirement for machine learning on any data objects that have ambiguous representations and live in continuous spaces. For periodic point sets under isometry (any distance-preserving transformation), we designed invariants that distinguish all known counter-examples to the completeness of past descriptors and detect thousands of (near-)duplicates in large high-profile databases of crystals within two days on a modest desktop computer.

Higher-order, generically complete, continuous, and polynomial-time isometry invariants of periodic sets

TL;DR

This work tackles the problem of distinguishing novel periodic crystals from near-duplicates under isometries by introducing a hierarchy of complete, Lipschitz-in-noise invariants. The core constructs are higher-order Pointwise Distance Distributions , their concatenated form , and the 1D Pointwise Shift Distribution , augmented by moments and PDA/ADA variants; comparisons are conducted via Earth Mover's Distance with ground metrics or RMS. The authors prove isometry invariance and, in 1D, completeness of PSD, with distinguishing all known homometric counter-examples in and provide practical polynomial-time computation for fixed . They also derive asymptotic behavior for and establish near-linear computational complexity for small dimensions, enabling scalable processing of large crystal databases. Empirically, on ICSD, MP, and GNoME, the invariants detect thousands of near-duplicates and support fast, hierarchical novelty screening, thereby strengthening the integrity of crystallographic databases and facilitating reliable material discovery.

Abstract

Periodic point sets model all solid crystalline materials (crystals) whose atoms can be considered zero-sized points with or without atomic types. This paper addresses the fundamental problem of checking whether claimed crystals are novel, not noisy perturbations of known materials obtained by unrealistic atomic replacements. Such near-duplicates have skewed ground-truth because past comparisons relied on unstable cells and symmetries. The proposed Lipschitz continuity under noise is a new essential requirement for machine learning on any data objects that have ambiguous representations and live in continuous spaces. For periodic point sets under isometry (any distance-preserving transformation), we designed invariants that distinguish all known counter-examples to the completeness of past descriptors and detect thousands of (near-)duplicates in large high-profile databases of crystals within two days on a modest desktop computer.

Paper Structure

This paper contains 8 sections, 15 theorems, 29 equations, 11 figures, 18 tables.

Key Result

Lemma 3.3

For any integers $h,k\geq 1\leq l\leq n$ and any finite unordered set $S$ in a metric space or any $l$-periodic point set $S\subset\mathbb R^n$, the higher-order $\mathrm{PDD}^{\{h\}}(S;k)$ from Definition dfn:PDDh is an isometry invariant of $S$.

Figures (11)

  • Figure 1: Left: any periodic point set can be given by many pairs (cell, motif), see Definition \ref{['dfn:periodic']}. Any periodic set has vastly different finite subsets within boxes or balls of the same cut-off size. Right: almost any perturbation can arbitrarily scale up a unit cell and break the symmetry.
  • Figure 2: For any $0<r\leq 1$, the homometric sets $S(r)=\{0,r,2+r,4\}+8\mathbb Z\not\cong Q(r)=\{0,r,2+r,4\}+8\mathbb Z$ have identical PDFs from Definition \ref{['dfn:homometric']} but different $\mathrm{PDD}$s whose first columns we write as unordered sets: $\mathrm{PDD}(S(r);1)=\{r,r,2-r,2-r\}\neq \mathrm{PDD}(Q(r);1)=\{r,r,2-r,2+r\}$.
  • Figure 3: The sets $S,Q$ are 1-periodic in the $x$-axis with period 4, e.g. $A$ denotes both $(0,a)$, $(4,a)$. Right: distances between closest points from classes modulo shifts by $4$ in $x$. Then $\mathrm{PDD}(S;k)=\mathrm{PDD}(Q;k)$ by Example \ref{['exa:PDD']} but $\mathrm{PDD}^{\{2\}}(S;1)\neq\mathrm{PDD}^{\{2\}}(Q;1)$ by Example \ref{['exa:6-point_pairs']}.
  • Figure 4: Left: a comparison of Pauling's crystals $P(\pm u)$ for $u=0.03$pauling1930crystal, by COMPACK chisholm2005compack, which aligns subsets of 15 atoms. The atoms from different $P(\pm 0.03)$ are shown in green and gray. Right: $\mathrm{EMD}_\infty$ from Definition \ref{['dfn:EMD']}(b) is between $\mathrm{PDD}^{\{h\}}$ for $k=100$ and Pauling's crystals $P(\pm u)$, which depend on $u\in[0,0.25]$ and are identical at the boundary values.
  • Figure 5: The distance $\mathrm{EMD}_\infty^{\{2\}}[100]$ between the 1-periodic sets $S,Q$ in Fig. \ref{['fig:6-point_pairs']}, which have identical $\mathrm{PDD}$s. The average and minimum of $\mathrm{EMD}_\infty^{\{2\}}[100]$ were computed for uniformly sampled parameters $a,b,c$ from Example \ref{['exa:6-point_pairs']}. These sets $S,Q$ are isometric for $b\in\{0,1\}$ but $\mathrm{EMD}_\infty^{\{2\}}[100]>0$ for $0<b<1$ experimentally confirms that $S\not\simeq Q$, see Example \ref{['exa:6-point_pairs']}.
  • ...and 6 more figures

Theorems & Definitions (44)

  • Definition 1.1: lattice, motif, $l$-periodic set
  • Definition 2.1: bottleneck distance $d_B$
  • Definition 2.2: metrics vs pseudo-metrics
  • Definition 2.3: Pointwise Distance Distribution $\mathrm{PDD}$
  • Definition 2.4: homometric sets
  • Example 2.5: sets with equal PDDs
  • Definition 3.1: higher order $\mathrm{PDD}^{\{h\}}(S;k)$
  • Example 3.2: $\mathrm{PDD}^{(2)}$ for the sequences in Fig. \ref{['fig:1D_periodic_Sr+Qr']}
  • Lemma 3.3: invariance of $\mathrm{PDD}^{\{h\}}(S;k)$
  • proof
  • ...and 34 more