Higher-order, generically complete, continuous, and polynomial-time isometry invariants of periodic sets
Daniel E Widdowson, Vitaliy A Kurlin
TL;DR
This work tackles the problem of distinguishing novel periodic crystals from near-duplicates under isometries by introducing a hierarchy of complete, Lipschitz-in-noise invariants. The core constructs are higher-order Pointwise Distance Distributions $\mathrm{PDD}^{\{h\}}(S;k)$, their concatenated form $\mathrm{PDD}^{(h)}$, and the 1D Pointwise Shift Distribution $\mathrm{PSD}$, augmented by moments $\mu^{(t)}[\mathrm{PDD}^{\{h\}}]$ and PDA/ADA variants; comparisons are conducted via Earth Mover's Distance $\mathrm{EMD}$ with ground metrics $L_q$ or RMS. The authors prove isometry invariance and, in 1D, completeness of PSD, with $\mathrm{PDD}^{\{2\}}$ distinguishing all known homometric counter-examples in $\mathbb{R}^3$ and provide practical polynomial-time computation for fixed $n$. They also derive asymptotic behavior for $\mathrm{PDD}^{\{h\}}(S;k)$ and establish near-linear computational complexity for small dimensions, enabling scalable processing of large crystal databases. Empirically, on ICSD, MP, and GNoME, the invariants detect thousands of near-duplicates and support fast, hierarchical novelty screening, thereby strengthening the integrity of crystallographic databases and facilitating reliable material discovery.
Abstract
Periodic point sets model all solid crystalline materials (crystals) whose atoms can be considered zero-sized points with or without atomic types. This paper addresses the fundamental problem of checking whether claimed crystals are novel, not noisy perturbations of known materials obtained by unrealistic atomic replacements. Such near-duplicates have skewed ground-truth because past comparisons relied on unstable cells and symmetries. The proposed Lipschitz continuity under noise is a new essential requirement for machine learning on any data objects that have ambiguous representations and live in continuous spaces. For periodic point sets under isometry (any distance-preserving transformation), we designed invariants that distinguish all known counter-examples to the completeness of past descriptors and detect thousands of (near-)duplicates in large high-profile databases of crystals within two days on a modest desktop computer.
