Geometric Data Science
Olga D Anosova, Vitaliy A Kurlin
TL;DR
This work formulates Geometric Data Science as a rigorous framework to compare real objects by moduli spaces under practical equivalences, anchoring analysis in invariants and metrics with guaranteed continuity and polynomial-time computability. It develops complete, Lipschitz-continuous invariants for finite clouds (BRI, PCI, WMI, PDD/SDD/SCD) and extends these ideas to periodic objects (1-periodic sequences, lattices, and density functions), delivering hierarchical invariants and efficient comparison algorithms. A key contribution is the construction of geomaps and moduli-space embeddings (e.g., RI/PI spaces, SLM spherical mapping) that enable robust, geodesic-style navigation of object universes, including biological macromolecules and crystalline materials. The framework unifies classical invariants (distance matrices, Gram matrices) with modern metric geometry, enabling fast, scalable detection of duplicates, isometry-invariant classification, and continuous measures of chirality and symmetry across both finite and periodic data. These advances have practical implications for materials discovery, protein structure analysis, and crystallography, providing tools to systematically explore and compare vast geometric datasets.
Abstract
This book introduces the new research area of Geometric Data Science, where data can represent any real objects through geometric measurements. The first part of the book focuses on finite point sets. The most important result is a complete and continuous classification of all finite clouds of unordered points under rigid motion in any Euclidean space. The key challenge was to avoid the exponential complexity arising from permutations of the given unordered points. For a fixed dimension of the ambient Euclidean space, the times of all algorithms for the resulting invariants and distance metrics depend polynomially on the number of points. The second part of the book advances a similar classification in the much more difficult case of periodic point sets, which model all periodic crystals at the atomic scale. The most significant result is the hierarchy of invariants from the ultra-fast to complete ones. The key challenge was to resolve the discontinuity of crystal representations that break down under almost any noise. Experimental validation on all major materials databases confirmed the Crystal Isometry Principle: any real periodic crystal has a unique location in a common moduli space of all periodic structures under rigid motion. The resulting moduli space contains all known and not yet discovered periodic crystals and hence continuously extends Mendeleev's table to the full crystal universe.
