Table of Contents
Fetching ...

Geographic-style maps with a local novelty distance help navigate the materials space

Daniel E Widdowson, Vitaliy A Kurlin

TL;DR

The paper defines the Crystal Isometry Space $\mathrm{CRIS}(\mathbb{R}^3)$ to formalize comparisons of crystals under isometry and introduces the Local Novelty Distance (LND) metric, which uses generically complete invariants derived from the Pointwise Distance Distribution $\mathrm{PDD}(S;k)$ and its PDA representation, with distances computed via Earth Mover's Distance (EMD). It demonstrates real-time novelty assessment for 43 A-lab crystals by locating near-duplicates in the ICSD and Materials Project, revealing many pre-existing matches and mapping the material universe with invariant coordinates. The authors show that these invariant maps provide stable, continuous navigation of the materials space, overcoming cell-based discontinuities and enabling rapid novelty checks. The work suggests broad utility for self-driving labs and data integrity, and points to future work connecting invariant coordinates to structure–property relationships. Overall, the approach offers a scalable, geometry-driven framework for detecting duplicates and exploring the materials landscape.

Abstract

With the advent of self-driving labs promising to synthesize large numbers of new materials, new automated tools are required for checking potential duplicates in existing structural databases before a material can be claimed as novel. To avoid duplication, we rigorously define the novelty metric of any periodic material as the smallest distance to its nearest neighbor among already known materials. Using ultra-fast structural invariants, all such nearest neighbors can be found within seconds on a typical computer even if a given crystal is disguised by changing a unit cell, perturbing atoms, or replacing chemical elements. This real-time novelty check is demonstrated by finding near-duplicates of the 43 materials produced by Berkeley's A-lab in the world's largest collections of inorganic structures, the Inorganic Crystal Structure Database and the Materials Project. To help future self-driving labs successfully identify novel materials, we propose navigation maps of the materials space where any new structure can be quickly located by its invariant descriptors similar to a geographic location on Earth.

Geographic-style maps with a local novelty distance help navigate the materials space

TL;DR

The paper defines the Crystal Isometry Space to formalize comparisons of crystals under isometry and introduces the Local Novelty Distance (LND) metric, which uses generically complete invariants derived from the Pointwise Distance Distribution and its PDA representation, with distances computed via Earth Mover's Distance (EMD). It demonstrates real-time novelty assessment for 43 A-lab crystals by locating near-duplicates in the ICSD and Materials Project, revealing many pre-existing matches and mapping the material universe with invariant coordinates. The authors show that these invariant maps provide stable, continuous navigation of the materials space, overcoming cell-based discontinuities and enabling rapid novelty checks. The work suggests broad utility for self-driving labs and data integrity, and points to future work connecting invariant coordinates to structure–property relationships. Overall, the approach offers a scalable, geometry-driven framework for detecting duplicates and exploring the materials landscape.

Abstract

With the advent of self-driving labs promising to synthesize large numbers of new materials, new automated tools are required for checking potential duplicates in existing structural databases before a material can be claimed as novel. To avoid duplication, we rigorously define the novelty metric of any periodic material as the smallest distance to its nearest neighbor among already known materials. Using ultra-fast structural invariants, all such nearest neighbors can be found within seconds on a typical computer even if a given crystal is disguised by changing a unit cell, perturbing atoms, or replacing chemical elements. This real-time novelty check is demonstrated by finding near-duplicates of the 43 materials produced by Berkeley's A-lab in the world's largest collections of inorganic structures, the Inorganic Crystal Structure Database and the Materials Project. To help future self-driving labs successfully identify novel materials, we propose navigation maps of the materials space where any new structure can be quickly located by its invariant descriptors similar to a geographic location on Earth.

Paper Structure

This paper contains 8 sections, 1 theorem, 13 figures, 5 tables.

Key Result

Theorem 6

If $S$ is obtained from a crystal $Q$ in a dataset $D$ by perturbing every point of $Q$ up to $\varepsilon<r(Q)$, then $\mathrm{LND}(S;D)\leq2\varepsilon$. To get $S$ from a crystal $Q\in D$ with $\mathrm{LND}(S;D)<2r(Q)$, some atom of $Q$ should be perturbed by at least $0.5\mathrm{LND}(S;D)$.

Figures (13)

  • Figure 1: Almost any tiny perturbation discontinuously scales up a primitive cell and makes unreliable any comparison based on cells or motifs. This discontinuity was resolved without relying on cells widdowson2022resolvingwiddowson2025higher.
  • Figure 2: Left: the Crystal Isometry Principle says that all chemistry of any real periodic crystal under standard ambient conditions can be reconstructed from (the isometry class of) the periodic set of atomic centers given with precisely enough coordinates widdowson2022resolving. Right: most optimization methods output local optima without exploring the space around. De-fogging this Crystal Isometry Space$\mathrm{CRIS}(\mathbb{R}^3)$ beyond known or predicted materials will enable a proper navigation across the crystal universe.
  • Figure 3: The average invariants $\mathrm{AMD}_k$ and $\mathrm{ADA}_k$ from Definition \ref{['dfn:PDA']} for $k=1,\dots,25$ and five simple crystals from the Materials Project, see more details and perovskite examples in the appendix.
  • Figure 4: The averages of $\mathrm{ADA}_k$ and standard deviations (1 sigma shaded) vs $\sqrt[3]{k}$ for four databases.
  • Figure 5: Left: MnAgO2 synthesized by A-lab. Middle: ICSD entry 670065 with the same composition and $\mathrm{EMD}=0.097\textup{\AA}$ found by structural invariants in Table \ref{['tab:A-lab-ICSD-matches']}, though its unit cell is very different from the cell of MnAgO2. Right: another ICSD entry 139006 from 2021 matched by leeman2024challenges and found by unit cell search, but is more distant from MnAgO2 by $\mathrm{EMD}=0.368\textup{\AA}$ on invariants $\mathrm{PDA}(S;100)$.
  • ...and 8 more figures

Theorems & Definitions (8)

  • Definition 1: space of periodic materials
  • Definition 2: isometry invariant $\mathrm{PDD}(S;k)$
  • Definition 3: Point Packing Coefficient $\mathrm{PPC}$
  • Definition 4: invariants $\mathrm{AMD}$, $\mathrm{ADA}$, $\mathrm{PDA}$
  • Definition 5: Local Novelty Distance $\mathrm{LND}(S;D)$
  • Theorem 6
  • Definition 7: Earth Mover's Distance $\mathrm{EMD}$ rubner2000earth
  • proof : Proof of Theorem \ref{['thm:LND']}