Context Determines Optimal Architecture in Materials Segmentation
Mingjian Lu, Pawan K. Tripathi, Mark Shteyn, Debargha Ganguly, Roger H. French, Vipin Chaudhary, Yinghui Wu
TL;DR
The paper addresses deployment gaps in materials image segmentation by showing that optimal architectures vary across imaging modalities. It introduces a cross-modal configuration framework with three modules—Cross-Modal Configuration, Quality Feedback, and Expert Knowledge Integration—that standardize inputs, produce segmentation with reliability signals, and generate interpretability heatmaps and counterfactual explanations. Across seven datasets and six encoder-decoder configurations, it demonstrates context-dependent architecture performance, with UNet favored for high-contrast 2D surfaces and DeepLabv3+ for hard, multi-scale volumetric tasks, and enhances deployment confidence via Forte-based out-of-distribution detection and expert-aligned explanations. The framework enables researchers to select architecture choices tailored to their imaging setup and to assess model trustworthiness in new samples, thereby facilitating reliable, automated materials characterization.
Abstract
Segmentation architectures are typically benchmarked on single imaging modalities, obscuring deployment-relevant performance variations: an architecture optimal for one modality may underperform on another. We present a cross-modal evaluation framework for materials image segmentation spanning SEM, AFM, XCT, and optical microscopy. Our evaluation of six encoder-decoder combinations across seven datasets reveals that optimal architectures vary systematically by context: UNet excels for high-contrast 2D imaging while DeepLabv3+ is preferred for the hardest cases. The framework also provides deployment feedback via out-of-distribution detection and counterfactual explanations that reveal which microstructural features drive predictions. Together, the architecture guidance, reliability signals, and interpretability tools address a practical gap in materials characterization, where researchers lack tools to select architectures for their specific imaging setup or assess when models can be trusted on new samples.
