Simulation of microstructures and machine learning
Katja Schladitz, Claudia Redenbach, Tin Barisin, Christian Jung, Natascha Jeziorski, Lovro Bosnar, Juraj Fulir, Petra Gospodnetić
TL;DR
The paper addresses data scarcity in ML for imaging microstructures and surface inspection by leveraging synthetic data from stochastic geometry. It analyzes three use cases: reconstruction of highly porous structures from FIB-SEM, crack segmentation in 3D concrete CT, and optical surface defect inspection. It demonstrates training ML models on synthetic data, including 3D U-net for porosity reconstruction and a novel scale-invariant RieszNet for cracks, highlighting the benefits and remaining generalization challenges. The discussion points to open questions on realism, domain gap quantification, and the need for dedicated metrics and rendering choices to reliably transfer synthetic training to real data.
Abstract
Machine learning offers attractive solutions to challenging image processing tasks. Tedious development and parametrization of algorithmic solutions can be replaced by training a convolutional neural network or a random forest with a high potential to generalize. However, machine learning methods rely on huge amounts of representative image data along with a ground truth, usually obtained by manual annotation. Thus, limited availability of training data is a critical bottleneck. We discuss two use cases: optical quality control in industrial production and segmenting crack structures in 3D images of concrete. For optical quality control, all defect types have to be trained but are typically not evenly represented in the training data. Additionally, manual annotation is costly and often inconsistent. It is nearly impossible in the second case: segmentation of crack systems in 3D images of concrete. Synthetic images, generated based on realizations of stochastic geometry models, offer an elegant way out. A wide variety of structure types can be generated. The within structure variation is naturally captured by the stochastic nature of the models and the ground truth is for free. Many new questions arise. In particular, which characteristics of the real image data have to be met to which degree of fidelity.
