Table of Contents
Fetching ...

Adapting Segment Anything Model (SAM) to Experimental Datasets via Fine-Tuning on GAN-based Simulation: A Case Study in Additive Manufacturing

Anika Tabassum, Amirkoushyar Ziabari

TL;DR

This work tackles semantic segmentation of XCT imagery from additive manufacturing parts, where noise, sparse views, and domain shift hinder standard models. It presents a domain-adaptation pipeline that fine-tunes SAM using Conv-LoRa (a parameter-efficient MoE-based approach) and augments training with GAN-generated XCT volumes. Across InD and OoD scenarios, SAM-GAN improves segmentation metrics over a CycleGAN-trained 2.5D U‑Net, while real-data re-finetuning can recover some OoD performance but may cause catastrophic forgetting and reduced InD accuracy. The study highlights the potential and limitations of large foundational models for domain-specific materials imaging, pointing to future work in robust multi-class, 3D-capable, few-shot segmentation strategies for industrial inspection.

Abstract

Industrial X-ray computed tomography (XCT) is a powerful tool for non-destructive characterization of materials and manufactured components. XCT commonly accompanied by advanced image analysis and computer vision algorithms to extract relevant information from the images. Traditional computer vision models often struggle due to noise, resolution variability, and complex internal structures, particularly in scientific imaging applications. State-of-the-art foundational models, like the Segment Anything Model (SAM)-designed for general-purpose image segmentation-have revolutionized image segmentation across various domains, yet their application in specialized fields like materials science remains under-explored. In this work, we explore the application and limitations of SAM for industrial X-ray CT inspection of additive manufacturing components. We demonstrate that while SAM shows promise, it struggles with out-of-distribution data, multiclass segmentation, and computational efficiency during fine-tuning. To address these issues, we propose a fine-tuning strategy utilizing parameter-efficient techniques, specifically Conv-LoRa, to adapt SAM for material-specific datasets. Additionally, we leverage generative adversarial network (GAN)-generated data to enhance the training process and improve the model's segmentation performance on complex X-ray CT data. Our experimental results highlight the importance of tailored segmentation models for accurate inspection, showing that fine-tuning SAM on domain-specific scientific imaging data significantly improves performance. However, despite improvements, the model's ability to generalize across diverse datasets remains limited, highlighting the need for further research into robust, scalable solutions for domain-specific segmentation tasks.

Adapting Segment Anything Model (SAM) to Experimental Datasets via Fine-Tuning on GAN-based Simulation: A Case Study in Additive Manufacturing

TL;DR

This work tackles semantic segmentation of XCT imagery from additive manufacturing parts, where noise, sparse views, and domain shift hinder standard models. It presents a domain-adaptation pipeline that fine-tunes SAM using Conv-LoRa (a parameter-efficient MoE-based approach) and augments training with GAN-generated XCT volumes. Across InD and OoD scenarios, SAM-GAN improves segmentation metrics over a CycleGAN-trained 2.5D U‑Net, while real-data re-finetuning can recover some OoD performance but may cause catastrophic forgetting and reduced InD accuracy. The study highlights the potential and limitations of large foundational models for domain-specific materials imaging, pointing to future work in robust multi-class, 3D-capable, few-shot segmentation strategies for industrial inspection.

Abstract

Industrial X-ray computed tomography (XCT) is a powerful tool for non-destructive characterization of materials and manufactured components. XCT commonly accompanied by advanced image analysis and computer vision algorithms to extract relevant information from the images. Traditional computer vision models often struggle due to noise, resolution variability, and complex internal structures, particularly in scientific imaging applications. State-of-the-art foundational models, like the Segment Anything Model (SAM)-designed for general-purpose image segmentation-have revolutionized image segmentation across various domains, yet their application in specialized fields like materials science remains under-explored. In this work, we explore the application and limitations of SAM for industrial X-ray CT inspection of additive manufacturing components. We demonstrate that while SAM shows promise, it struggles with out-of-distribution data, multiclass segmentation, and computational efficiency during fine-tuning. To address these issues, we propose a fine-tuning strategy utilizing parameter-efficient techniques, specifically Conv-LoRa, to adapt SAM for material-specific datasets. Additionally, we leverage generative adversarial network (GAN)-generated data to enhance the training process and improve the model's segmentation performance on complex X-ray CT data. Our experimental results highlight the importance of tailored segmentation models for accurate inspection, showing that fine-tuning SAM on domain-specific scientific imaging data significantly improves performance. However, despite improvements, the model's ability to generalize across diverse datasets remains limited, highlighting the need for further research into robust, scalable solutions for domain-specific segmentation tasks.

Paper Structure

This paper contains 15 sections, 2 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: An example cross section (slice) comparison from the 3D volume of an additive manufacturing (AM ) component. a) Experimental XCT reconstruction; b) Ground Truth Segmentation (obtained through a higher resolution, better quality scan of the same object that are quire expensive (time and cost) to acquire every time). Four classes (air, material, pore, and inclusion) are identified in the ground truth image. Predictions obtained by c) pretrained, and d) finetuned SAM . An expanded view of the ROI (red dashed box in panel a), for each of the panels a-d are shown in e-h, respectively.
  • Figure 2: A summary of training (Tr-*) and testing (Te-*) data and their comparisons. GAN-generated data based on two metallic materials and different scan settings are shown in a) Tr-1 and b) Tr-2. Panels e-j show 6 crops from test data Te-1 through Te-6: Te-1: OoD test sample (Real, with more inclusions and fewer pores, and a slightly different noise distribution); Te-2: OoD test sample (Real, with more pores and fewer inclusions, and a different noise distribution); Te-3: InD test sample (with more inclusions and fewer pores, but similar noise distribution); Te-4: OoD test sample (Real, with more pores, no inclusions, and significantly less noise); Te-5: OoD test sample (Synthetic, with more pores, no inclusions, and a slightly different noise distribution); Te-6: OoD test sample (Real, with more pores, inclusions, and increased streaking noise). Panel (c) is an error-bar plot of Fréchet Inception Distance (FID) comparing test data against training data. A larger FID score indicates stronger OoD characteristics. Overall, the OoD data cover scenarios ranging from entirely different from both training datasets to being similar to one but not the other. In panel (d), we plotted pore and inclusion volume density (i.e., the proportion of the sample’s volume consisting of defects). Pore density reached up to 3% in Tr-1, while inclusion density was under 1%. All test data exhibit lower inclusion density between 0-0.25%, but pore density can reach up to 6%, making detection harder and impacting performance.
  • Figure 3: Parameter-efficient fine-tuned SAM (PEFT-SAM) for multiclass segmentation. (a) Architecture of Conv-LoRa utilizing $3$ components of SAM (image encoder, prompt encoder, mask decoder). Blue denotes frozen weights of SAM , green denotes trainable parameters for fine-tuning. The parallel encoder-decoder is the MoE-based low-rank structure that Conv-LoRa uses. (b) Pipeline of inference for multiclass predicted masks, where a PEFT-SAM is fine-tuned, each with a binary mask per class in XCT. Finally, the post-processing aggregator is used for multiclass segmentation.
  • Figure 4: IoU scores of SAM fine-tuned models compared to the baseline on test datasets for (a) material, (b) pores, and (c) inclusions. Error bars (black) represent the standard deviation. The x-axis shows the test OoD data (from left to right, Te-1--Te-5). The IoU performance of SAM-GAN on all 3 classes is inversely proportional to the high FID score of the OoD data (shown in Fig. \ref{['fig:fid_ood']}(c)). SAM-GAN tends to show low IoU for OoD datasets with high FID relative to the training data. Higher IoU is better. Performance in terms of mean F1-score for (d) pores and (e) inclusions is analogous to panels (a)-(c). SAM-GAN outperforms the baseline on InD and OoD datasets with no inclusions. Similar to panels (a)-(c), SAM-GAN shows a higher F1-score for pores when the FID is low, and for inclusion, when volume of inclusions are high in the test. A higher F1-score is desired.
  • Figure 5: Performance of fine-tuned SAM based on GAN data on InD (Te-3) and OoD tasks (Te-1,2,4,5). Each column corresponds to the test number, and each row shows the input XCT image, ground truth (GT), and the output of the fine-tuned 2.5D U-Net and SAM models (both trained on GAN-generated data). While the fine-tuned model clearly captures materials, pores, and inclusions for InD and weak OoD cases, it struggles to recognize smoother material regions in strong OoD tasks. However, the SAM model remains consistent in detecting pores and inclusions across the different InD and OoD datasets.
  • ...and 5 more figures