Recognition of near-duplicate periodic patterns by continuous metrics with approximation guarantees
Olga Anosova, Daniel Widdowson, Vitaliy Kurlin
TL;DR
The paper tackles the problem of recognizing near-duplicate periodic patterns under rigid motion in Euclidean space by introducing a boundary-tolerant local metric $BT$ and extending to a complete invariant via Earth Mover's Distance on isosets. It proves Lipschitz continuity of the resulting distance and provides polynomial-time approximation guarantees, enabling robust comparison of periodic point sets in any dimension. The core result is a complete isometry classification via isosets: two periodic sets are isometric iff their isosets at a common stable radius are bijectively equivalent with matching weights. Applications to crystal databases (CSD, GNoME) demonstrate practical utility for detecting near-duplicates and ensuring data integrity, with mirror-image distinctions achievable where prior descriptors fail.
Abstract
This paper rigorously solves the challenging problem of recognizing periodic patterns under rigid motion in Euclidean geometry. The 3-dimensional case is practically important for justifying the novelty of solid crystalline materials (periodic crystals) and for patenting medical drugs in a solid tablet form. Past descriptors based on finite subsets fail when a unit cell of a periodic pattern discontinuously changes under almost any perturbation of atoms, which is inevitable due to noise and atomic vibrations. The major problem is not only to find complete invariants (descriptors with no false negatives and no false positives for all periodic patterns) but to design efficient algorithms for distance metrics on these invariants that should continuously behave under noise. The proposed continuous metrics solve this problem in any Euclidean dimension and are algorithmically approximated with small error factors in times that are explicitly bounded in the size and complexity of a given pattern. The proved Lipschitz continuity allows us to confirm all near-duplicates filtered by simpler invariants in major databases of experimental and simulated crystals. This practical detection of noisy duplicates will stop the artificial generation of `new' materials from slight perturbations of known crystals. Several such duplicates are under investigation by five journals for data integrity.
