Predicting Performance of Object Detection Models in Electron Microscopy Using Random Forests

Ni Li; Ryan Jacobs; Matthew Lynch; Vidit Agrawal; Kevin Field; Dane Morgan

Predicting Performance of Object Detection Models in Electron Microscopy Using Random Forests

Ni Li, Ryan Jacobs, Matthew Lynch, Vidit Agrawal, Kevin Field, Dane Morgan

TL;DR

The paper addresses the challenge of estimating object-detection performance on unlabeled TEM data by learning a mapping from features derived from Mask R-CNN outputs to the detection $F1$ score using a random forest regressor. This enables rapid, ground-truth-free performance predictions on new images and helps assess domain shift implications for defect detection in irradiated metal alloys. The authors validate their approach across three TEM datasets, achieving an overall MAE of $0.093$ and $R^2=0.774$, with SHAP analysis identifying key features such as high-confidence defect counts and confidence statistics as influential. Practically, the method provides guardrails for users to gauge applicability, identify when fine-tuning may be needed, and flag potential out-of-domain inputs, thereby enhancing the reliability of automated TEM defect detection workflows.

Abstract

Quantifying prediction uncertainty when applying object detection models to new, unlabeled datasets is critical in applied machine learning. This study introduces an approach to estimate the performance of deep learning-based object detection models for quantifying defects in transmission electron microscopy (TEM) images, focusing on detecting irradiation-induced cavities in TEM images of metal alloys. We developed a random forest regression model that predicts the object detection F1 score, a statistical metric used to evaluate the ability to accurately locate and classify objects of interest. The random forest model uses features extracted from the predictions of the object detection model whose uncertainty is being quantified, enabling fast prediction on new, unlabeled images. The mean absolute error (MAE) for predicting F1 of the trained model on test data is 0.09, and the $R^2$ score is 0.77, indicating there is a significant correlation between the random forest regression model predicted and true defect detection F1 scores. The approach is shown to be robust across three distinct TEM image datasets with varying imaging and material domains. Our approach enables users to estimate the reliability of a defect detection and segmentation model predictions and assess the applicability of the model to their specific datasets, providing valuable information about possible domain shifts and whether the model needs to be fine-tuned or trained on additional data to be maximally effective for the desired use case.

Predicting Performance of Object Detection Models in Electron Microscopy Using Random Forests

TL;DR

The paper addresses the challenge of estimating object-detection performance on unlabeled TEM data by learning a mapping from features derived from Mask R-CNN outputs to the detection

score using a random forest regressor. This enables rapid, ground-truth-free performance predictions on new images and helps assess domain shift implications for defect detection in irradiated metal alloys. The authors validate their approach across three TEM datasets, achieving an overall MAE of

and

, with SHAP analysis identifying key features such as high-confidence defect counts and confidence statistics as influential. Practically, the method provides guardrails for users to gauge applicability, identify when fine-tuning may be needed, and flag potential out-of-domain inputs, thereby enhancing the reliability of automated TEM defect detection workflows.

Abstract

score is 0.77, indicating there is a significant correlation between the random forest regression model predicted and true defect detection F1 scores. The approach is shown to be robust across three distinct TEM image datasets with varying imaging and material domains. Our approach enables users to estimate the reliability of a defect detection and segmentation model predictions and assess the applicability of the model to their specific datasets, providing valuable information about possible domain shifts and whether the model needs to be fine-tuned or trained on additional data to be maximally effective for the desired use case.

Paper Structure (7 sections, 1 equation, 9 figures, 3 tables)

This paper contains 7 sections, 1 equation, 9 figures, 3 tables.

Introduction
Data and Methods
Data acquisition
Mask R-CNN model and assessment
Random forest model and assessment
Feature engineering
Results and Discussion

Figures (9)

Figure 1: Workflow diagram illustrating the process of estimating the defect detection F1 score using a trained Mask R-CNN model. The procedure includes using the trained Mask R-CNN to identify defects in TEM images, extracting key features from the predicted defects, and utilizing a Random Forest Regression model to predict the F1 score, thereby estimating the performance without the need for ground truth labels.
Figure 2: Data Generation and Utilization Workflow. This flowchart illustrates the sequential steps undertaken in our study, starting from the collection of TEM images, through the training and evaluation of the Mask R-CNN model, to the feature extraction and final F1 score prediction using Random Forest regression. The data is distinctly categorized for Mask R-CNN training and testing, followed by a five-fold cross-validation scheme applied in the Random Forest training phase, highlighting the two experimental setups: consistent source (random splits) and varied source (grouped splits) between training and testing datasets.
Figure 3: SHAP value analysis of all feature candidates.
Figure 4: RMSE, $R^2$, and MAE of the trained random forest model on test data as a function of the number of features used in the model.
Figure 5: (a) Histogram of defect find F1 scores. (b) The average defect find F1 scores with standard deviation error bars for different subsets of data. The dashed green and blue lines represent the average defect find F1 scores across all grouped and random split data, respectively. The green and blue shades depict the standard deviation over all data from grouped splits and random splits, respectively. Data labels indicate the different split of training and testing datasets.
...and 4 more figures

Predicting Performance of Object Detection Models in Electron Microscopy Using Random Forests

TL;DR

Abstract

Predicting Performance of Object Detection Models in Electron Microscopy Using Random Forests

Authors

TL;DR

Abstract

Table of Contents

Figures (9)