Predicting Performance of Object Detection Models in Electron Microscopy Using Random Forests
Ni Li, Ryan Jacobs, Matthew Lynch, Vidit Agrawal, Kevin Field, Dane Morgan
TL;DR
The paper addresses the challenge of estimating object-detection performance on unlabeled TEM data by learning a mapping from features derived from Mask R-CNN outputs to the detection $F1$ score using a random forest regressor. This enables rapid, ground-truth-free performance predictions on new images and helps assess domain shift implications for defect detection in irradiated metal alloys. The authors validate their approach across three TEM datasets, achieving an overall MAE of $0.093$ and $R^2=0.774$, with SHAP analysis identifying key features such as high-confidence defect counts and confidence statistics as influential. Practically, the method provides guardrails for users to gauge applicability, identify when fine-tuning may be needed, and flag potential out-of-domain inputs, thereby enhancing the reliability of automated TEM defect detection workflows.
Abstract
Quantifying prediction uncertainty when applying object detection models to new, unlabeled datasets is critical in applied machine learning. This study introduces an approach to estimate the performance of deep learning-based object detection models for quantifying defects in transmission electron microscopy (TEM) images, focusing on detecting irradiation-induced cavities in TEM images of metal alloys. We developed a random forest regression model that predicts the object detection F1 score, a statistical metric used to evaluate the ability to accurately locate and classify objects of interest. The random forest model uses features extracted from the predictions of the object detection model whose uncertainty is being quantified, enabling fast prediction on new, unlabeled images. The mean absolute error (MAE) for predicting F1 of the trained model on test data is 0.09, and the $R^2$ score is 0.77, indicating there is a significant correlation between the random forest regression model predicted and true defect detection F1 scores. The approach is shown to be robust across three distinct TEM image datasets with varying imaging and material domains. Our approach enables users to estimate the reliability of a defect detection and segmentation model predictions and assess the applicability of the model to their specific datasets, providing valuable information about possible domain shifts and whether the model needs to be fine-tuned or trained on additional data to be maximally effective for the desired use case.
