Table of Contents
Fetching ...

Unsupervised Machine-Learning Pipeline for Data-Driven Defect Detection and Characterisation: Application to Displacement Cascades

Samuel Del Fré, Andrée de Backer, Christophe Domain, Ludovic Thuinet, Charlotte S. Becquart

TL;DR

This work tackles the challenge of characterizing primary radiation damage from displacement cascades by introducing a fully unsupervised pipeline that maps defect morphologies directly from molecular dynamics data. By encoding local atomic environments with SOAP descriptors, detecting anomalies with autoencoders, and organizing the anomalous data with UMAP followed by HDBSCAN clustering, the method identifies coherent defect motifs without labeled data across Ni, FeNiCr, and Zr. The approach achieves tight, physically meaningful groupings, enables calibration between defect content and outlier counts (R^2>0.89), and shows strong overlap yet complementary coverage with conventional detectors like DXA and CS, while offering template-free insights beyond PTM. The framework provides a scalable, interpretable tool for mapping irradiation-induced defects and can be extended to larger cascade datasets, other materials, and different recoil energies, facilitating data-driven materials design for radiation environments.

Abstract

Neutron irradiation produces, within a few picoseconds, displacement cascades that are sequences of atomic collisions generating point and extended defects which subsequently affects the long-term evolution of materials. The diversity of these defects, characterized morphologically and statistically, defines what is called the "primary damage". In this work, we present a fully unsupervised machine learning (ML) workflow that detects and classifies these defects directly from molecular dynamics data. Local environments are encoded by the Smooth Overlap of Atomic Positions (SOAP) vector, anomalous atoms are isolated with autoencoder neural networks (AE), embedded with Uniform Manifold Approximation and Projection (UMAP) and clustered using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). Applied to 80 keV displacement cascades in Ni, Fe$_7$0Ni$_{10}$Cr$_{20}$, and Zr, the AE successfully identify the small fraction of outlier atoms that participate in defect formation. HDBSCAN then partitions the UMAP latent space of AE-flagged SOAP descriptors into well defined groups representing vacancy- and interstitial-dominated regions and, within each, separates small from large aggregates, assigning 99.7 % of outliers to compact physical motifs. A signed cluster-identification score confirms this separation, and cluster size scales with net defect counts (R2 > 0.89). Statistical cross analyses between the ML outlier map and several conventional detectors (centrosymmetry, dislocation extraction, etc.) reveal strong overlap and complementary coverage, all achieved without template or threshold tuning. This ML workflow thus provides an efficient tool for the quantitative mapping of structural anomalies in materials, particularly those arising from irradiation damage in displacement cascades.

Unsupervised Machine-Learning Pipeline for Data-Driven Defect Detection and Characterisation: Application to Displacement Cascades

TL;DR

This work tackles the challenge of characterizing primary radiation damage from displacement cascades by introducing a fully unsupervised pipeline that maps defect morphologies directly from molecular dynamics data. By encoding local atomic environments with SOAP descriptors, detecting anomalies with autoencoders, and organizing the anomalous data with UMAP followed by HDBSCAN clustering, the method identifies coherent defect motifs without labeled data across Ni, FeNiCr, and Zr. The approach achieves tight, physically meaningful groupings, enables calibration between defect content and outlier counts (R^2>0.89), and shows strong overlap yet complementary coverage with conventional detectors like DXA and CS, while offering template-free insights beyond PTM. The framework provides a scalable, interpretable tool for mapping irradiation-induced defects and can be extended to larger cascade datasets, other materials, and different recoil energies, facilitating data-driven materials design for radiation environments.

Abstract

Neutron irradiation produces, within a few picoseconds, displacement cascades that are sequences of atomic collisions generating point and extended defects which subsequently affects the long-term evolution of materials. The diversity of these defects, characterized morphologically and statistically, defines what is called the "primary damage". In this work, we present a fully unsupervised machine learning (ML) workflow that detects and classifies these defects directly from molecular dynamics data. Local environments are encoded by the Smooth Overlap of Atomic Positions (SOAP) vector, anomalous atoms are isolated with autoencoder neural networks (AE), embedded with Uniform Manifold Approximation and Projection (UMAP) and clustered using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). Applied to 80 keV displacement cascades in Ni, Fe0NiCr, and Zr, the AE successfully identify the small fraction of outlier atoms that participate in defect formation. HDBSCAN then partitions the UMAP latent space of AE-flagged SOAP descriptors into well defined groups representing vacancy- and interstitial-dominated regions and, within each, separates small from large aggregates, assigning 99.7 % of outliers to compact physical motifs. A signed cluster-identification score confirms this separation, and cluster size scales with net defect counts (R2 > 0.89). Statistical cross analyses between the ML outlier map and several conventional detectors (centrosymmetry, dislocation extraction, etc.) reveal strong overlap and complementary coverage, all achieved without template or threshold tuning. This ML workflow thus provides an efficient tool for the quantitative mapping of structural anomalies in materials, particularly those arising from irradiation damage in displacement cascades.

Paper Structure

This paper contains 19 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Schematic representation of the two branches workflow (A and B) for ML-assisted defect mapping described in section \ref{['ML']}. SOAP descriptors are computed (A.1) for a defect-free MD reference structure (fcc or hcp depending the materials) to train an autoencoder neural network (A.2). For each cascade snapshot, SOAP descriptors are computed using the same parameters than the reference structure (B.1). Each SOAP vector then passed through the trained AE (B.2) for which a reconstruction error (MSE) threshold is selected (B.3) to classify atoms as inliers (MSE < threshold) or outliers forming defect neighbourhoods (MSE > threshold) based on the per-atom reconstruction error (B.4). Outlier atoms are then embedded with UMAP (B.5) and grouped with HDBSCAN (B.6) to yield unsupervised defect-type groups.
  • Figure 2: Log–log histogram of per-atom reconstruction errors for the Ni dataset, obtained from the autoencoder-based analysis across multiple cascade simulations. The histogram is computed using logarithmically spaced bins, and the vertical axis is also plotted on a logarithmic scale to highlight variations across several orders of magnitude. The dashed vertical line indicates the selected threshold (5.0 in this case), which lies near an inflection in the distribution, separating low-error atoms from a high-error tail linked to defective environments.
  • Figure 3: Example of outlier detection in FeNiCr (left), Ni (middle) and Zr (right) displacement cascades (step B.4, see Figure \ref{['Fig0_diag']}). Atoms whose auto-encoder reconstruction error is below 5.0 (fcc systems) or 2.0 (hcp system) are omitted from the view; only atoms flagged as outliers are shown in solid colors.
  • Figure 4: (a) Two-dimensional UMAP projection of latent-space SOAP descriptors for outliers in FeNiCr, Ni and Zr systems, colored by HDBSCAN group labels. For FeNiCr and Ni, points labeled as -1 (black) represent samples that HDBSCAN did not assign to any group. Note that the HDBSCAN labels are assigned independently for each material, meaning that the same label in different systems does not necessarily correspond to the same type of defect pattern (b) Examples of representative atomic configurations associated with selected HDBSCAN groups from (a) (label -1 excluded), shown using the same color scheme. The relative size of each group, as a percentage of the outlier dataset, is also indicated. Transparent atoms represent outlier atoms associated to other HDBSCAN labels.
  • Figure 5: Cluster-identification (CID, see section \ref{['spatial_clust']} for description) diagnostics for the HDBSCAN groups for one typical cascade for FeNiCr, Ni and Zr. For each material, the histogram displays the distribution of the variable $\mathrm{CID}=\mathrm{sign}(n_{\text{Def}})\times\mathrm{DefID}$, where $\mathrm{sign}(n_{\text{Def}})>0$ (triangles) denotes interstitial-dominated clusters and $\mathrm{sign}(n_{\text{Def}})<0$ (circles) denotes vacancy-dominated clusters. The vertical dotted line marks $\mathrm{CID}=0$. The magnitude $|\mathrm{CID}|$ is inversely proportional to aggregate size: small $|\mathrm{CID}|$ corresponds to large clusters, large $|\mathrm{CID}|$ to small clusters and marker size is inversely proportional to $|\mathrm{CID}|$. Hence, points far to the right represent small interstitial defects, whereas points far to the left correspond to small vacancy defects; values near the origin indicate the largest aggregates of either type. Bars are coloured according to the HDBSCAN labels defined in Fig. \ref{['Fig2']}(a).
  • ...and 3 more figures