Unsupervised Machine-Learning Pipeline for Data-Driven Defect Detection and Characterisation: Application to Displacement Cascades
Samuel Del Fré, Andrée de Backer, Christophe Domain, Ludovic Thuinet, Charlotte S. Becquart
TL;DR
This work tackles the challenge of characterizing primary radiation damage from displacement cascades by introducing a fully unsupervised pipeline that maps defect morphologies directly from molecular dynamics data. By encoding local atomic environments with SOAP descriptors, detecting anomalies with autoencoders, and organizing the anomalous data with UMAP followed by HDBSCAN clustering, the method identifies coherent defect motifs without labeled data across Ni, FeNiCr, and Zr. The approach achieves tight, physically meaningful groupings, enables calibration between defect content and outlier counts (R^2>0.89), and shows strong overlap yet complementary coverage with conventional detectors like DXA and CS, while offering template-free insights beyond PTM. The framework provides a scalable, interpretable tool for mapping irradiation-induced defects and can be extended to larger cascade datasets, other materials, and different recoil energies, facilitating data-driven materials design for radiation environments.
Abstract
Neutron irradiation produces, within a few picoseconds, displacement cascades that are sequences of atomic collisions generating point and extended defects which subsequently affects the long-term evolution of materials. The diversity of these defects, characterized morphologically and statistically, defines what is called the "primary damage". In this work, we present a fully unsupervised machine learning (ML) workflow that detects and classifies these defects directly from molecular dynamics data. Local environments are encoded by the Smooth Overlap of Atomic Positions (SOAP) vector, anomalous atoms are isolated with autoencoder neural networks (AE), embedded with Uniform Manifold Approximation and Projection (UMAP) and clustered using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). Applied to 80 keV displacement cascades in Ni, Fe$_7$0Ni$_{10}$Cr$_{20}$, and Zr, the AE successfully identify the small fraction of outlier atoms that participate in defect formation. HDBSCAN then partitions the UMAP latent space of AE-flagged SOAP descriptors into well defined groups representing vacancy- and interstitial-dominated regions and, within each, separates small from large aggregates, assigning 99.7 % of outliers to compact physical motifs. A signed cluster-identification score confirms this separation, and cluster size scales with net defect counts (R2 > 0.89). Statistical cross analyses between the ML outlier map and several conventional detectors (centrosymmetry, dislocation extraction, etc.) reveal strong overlap and complementary coverage, all achieved without template or threshold tuning. This ML workflow thus provides an efficient tool for the quantitative mapping of structural anomalies in materials, particularly those arising from irradiation damage in displacement cascades.
