How Far Can You Grow? Characterizing the Extrapolation Frontier of Graph Generative Models for Materials Science

Can Polat; Erchin Serpedin; Mustafa Kurban; Hasan Kurban

How Far Can You Grow? Characterizing the Extrapolation Frontier of Graph Generative Models for Materials Science

Can Polat, Erchin Serpedin, Mustafa Kurban, Hasan Kurban

TL;DR

RADII defines the extrapolation frontier for geometric crystalline generative models and provides a radius-resolved benchmark spanning ~75k nanoparticle structures with leakage-free splits to map generation quality across radii. By treating radius as a continuous scaling axis, RADII reveals that all architectures incur a roughly $\sim13\%$ global RMSD increase beyond the training range, while local bond fidelity varies dramatically and the frontier forms a multi-dimensional surface. Well-behaved models obey a power-law scaling $\alpha \approx 1/3$, enabling outward forecasting of out-of-distribution errors from in-distribution fits. The work delivers a reproducible, geometry-grounded diagnostic framework and dataset to guide the development of scalable, geometry-aware generative models for nanomaterials.

Abstract

Every generative model for crystalline materials harbors a critical structure size beyond which its outputs quietly become unreliable -- we call this the extrapolation frontier. Despite its direct consequences for nanomaterial design, this frontier has never been systematically measured. We introduce RADII, a radius-resolved benchmark of ${\sim}$75,000 nanoparticle structures (55-11,298 atoms) that treats radius as a continuous scaling knob to trace generation quality from in-distribution to out-of-distribution regimes under leakage-free splits. RADII provides frontier-specific diagnostics: per-radius error profiles pinpoint each architecture's scaling ceiling, surface-interior decomposition tests whether failures originate at boundaries or in bulk, and cross-metric failure sequencing reveals which aspect of structural fidelity breaks first. Benchmarking five state-of-the-art architectures, we find that: (i) all models degrade by ${\sim}13\%$ in global positional error beyond training radii, yet local bond fidelity diverges wildly across architectures -- from near-zero to over $2\times$ collapse; (ii) no two architectures share the same failure sequence, revealing the frontier as a multi-dimensional surface shaped by model family; and (iii) well-behaved models obey a power-law scaling exponent $α\approx 1/3$ whose in-distribution fit accurately predicts out-of-distribution error, making their frontiers quantitatively forecastable. These findings establish output scale as a first-class evaluation axis for geometric generative models. The dataset and code are available at https://github.com/KurbanIntelligenceLab/RADII.

How Far Can You Grow? Characterizing the Extrapolation Frontier of Graph Generative Models for Materials Science

TL;DR

global RMSD increase beyond the training range, while local bond fidelity varies dramatically and the frontier forms a multi-dimensional surface. Well-behaved models obey a power-law scaling

, enabling outward forecasting of out-of-distribution errors from in-distribution fits. The work delivers a reproducible, geometry-grounded diagnostic framework and dataset to guide the development of scalable, geometry-aware generative models for nanomaterials.

Abstract

75,000 nanoparticle structures (55-11,298 atoms) that treats radius as a continuous scaling knob to trace generation quality from in-distribution to out-of-distribution regimes under leakage-free splits. RADII provides frontier-specific diagnostics: per-radius error profiles pinpoint each architecture's scaling ceiling, surface-interior decomposition tests whether failures originate at boundaries or in bulk, and cross-metric failure sequencing reveals which aspect of structural fidelity breaks first. Benchmarking five state-of-the-art architectures, we find that: (i) all models degrade by

in global positional error beyond training radii, yet local bond fidelity diverges wildly across architectures -- from near-zero to over

collapse; (ii) no two architectures share the same failure sequence, revealing the frontier as a multi-dimensional surface shaped by model family; and (iii) well-behaved models obey a power-law scaling exponent

whose in-distribution fit accurately predicts out-of-distribution error, making their frontiers quantitatively forecastable. These findings establish output scale as a first-class evaluation axis for geometric generative models. The dataset and code are available at https://github.com/KurbanIntelligenceLab/RADII.

Paper Structure (23 sections, 1 theorem, 13 equations, 3 figures, 1 table)

This paper contains 23 sections, 1 theorem, 13 equations, 3 figures, 1 table.

Introduction
Related Work
Geometric Graph Generation for Materials
Scalability Limits of Physics-Based Methods
Evaluation Gaps in Existing Benchmarks
RADII Construction
Task Formulation
Material Selection and Structure Generation
Radius Split Protocol
Quaternion-Based Orientation Sampling
Evaluation Metrics
Generation Quality Measures
Failure Decomposition Diagnostics
Frontier Characterization
Experiments
...and 8 more sections

Key Result

proposition 1

For any $q_i,q_j\in\mathcal{Q}$ returned by the greedy procedure at spacing $\Delta\theta$,

Figures (3)

Figure 1: From primitive cell to radius‑controlled nanoclusters. For each material in dataset, the panels show—left to right—the primitive unit cell followed by its canonical R = 6 Å and R = 30 Å nanoparticles. Materials are arranged from top to bottom in ascending order of the atom count in their R30 cluster, illustrating how coordination environments and bulk‑like cores emerge with increasing radius. All views share a common Ångström scale. Atom colours follow the conventional CPK palette.
Figure 2: Extrapolation frontier across multiple dimensions of structural fidelity.(a) Global RMSD increases beyond the in-distribution boundary, revealing differing degrees of OOD degradation across models. (b) Local bond geometry errors diverge more strongly, showing that extrapolation behavior depends on the evaluation metric. (c) Surface atoms consistently exhibit higher errors than interior atoms, with similar trends from ID to OOD regimes, indicating uniform degradation. (d) Orientation consistency generally degrades alongside positional accuracy, though stability varies across architectures. (e) Distribution shifts in OOD samples show broader error tails compared to ID. (f) Multi-metric comparison highlights architecture-specific failure modes, with different models degrading along different structural dimensions. Shaded regions denote ID and OOD radii, and error bands indicate $\pm 1$ standard deviation across materials and seeds.
Figure 3: Power-law scaling relationships quantify the extrapolation frontier. A log–log plot of RMSD versus atom count shows approximate power-law behavior, $\mathrm{RMSD} \sim N^\alpha$. Solid lines denote fits on in-distribution data, while dashed lines extrapolate into out-of-distribution regimes. Models with consistent scaling exhibit predictable extrapolation, whereas deviations indicate irregular or less predictable behavior beyond the training regime.

Theorems & Definitions (1)

proposition 1: Angular separation guarantee

How Far Can You Grow? Characterizing the Extrapolation Frontier of Graph Generative Models for Materials Science

TL;DR

Abstract

How Far Can You Grow? Characterizing the Extrapolation Frontier of Graph Generative Models for Materials Science

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (1)