Table of Contents
Fetching ...

An Augmentation-based Model Re-adaptation Framework for Robust Image Segmentation

Zheming Zuo, Joseph Smith, Jonathan Stonehouse, Boguslaw Obara

TL;DR

An Augmentation-based Model Re-adaptation Framework (AMRF) is proposed, which leverages data augmentation techniques during training to enhance the generalisation of segmentation models, allowing them to adapt to newly released datasets with temporal disparity.

Abstract

Image segmentation is a crucial task in computer vision, with wide-ranging applications in industry. The Segment Anything Model (SAM) has recently attracted intensive attention; however, its application in industrial inspection, particularly for segmenting commercial anti-counterfeit codes, remains challenging. Unlike open-source datasets, industrial settings often face issues such as small sample sizes and complex textures. Additionally, computational cost is a key concern due to the varying number of trainable parameters. To address these challenges, we propose an Augmentation-based Model Re-adaptation Framework (AMRF). This framework leverages data augmentation techniques during training to enhance the generalisation of segmentation models, allowing them to adapt to newly released datasets with temporal disparity. By observing segmentation masks from conventional models (FCN and U-Net) and a pre-trained SAM model, we determine a minimal augmentation set that optimally balances training efficiency and model performance. Our results demonstrate that the fine-tuned FCN surpasses its baseline by 3.29% and 3.02% in cropping accuracy, and 5.27% and 4.04% in classification accuracy on two temporally continuous datasets. Similarly, the fine-tuned U-Net improves upon its baseline by 7.34% and 4.94% in cropping, and 8.02% and 5.52% in classification. Both models outperform the top-performing SAM models (ViT-Large and ViT-Base) by an average of 11.75% and 9.01% in cropping accuracy, and 2.93% and 4.83% in classification accuracy, respectively.

An Augmentation-based Model Re-adaptation Framework for Robust Image Segmentation

TL;DR

An Augmentation-based Model Re-adaptation Framework (AMRF) is proposed, which leverages data augmentation techniques during training to enhance the generalisation of segmentation models, allowing them to adapt to newly released datasets with temporal disparity.

Abstract

Image segmentation is a crucial task in computer vision, with wide-ranging applications in industry. The Segment Anything Model (SAM) has recently attracted intensive attention; however, its application in industrial inspection, particularly for segmenting commercial anti-counterfeit codes, remains challenging. Unlike open-source datasets, industrial settings often face issues such as small sample sizes and complex textures. Additionally, computational cost is a key concern due to the varying number of trainable parameters. To address these challenges, we propose an Augmentation-based Model Re-adaptation Framework (AMRF). This framework leverages data augmentation techniques during training to enhance the generalisation of segmentation models, allowing them to adapt to newly released datasets with temporal disparity. By observing segmentation masks from conventional models (FCN and U-Net) and a pre-trained SAM model, we determine a minimal augmentation set that optimally balances training efficiency and model performance. Our results demonstrate that the fine-tuned FCN surpasses its baseline by 3.29% and 3.02% in cropping accuracy, and 5.27% and 4.04% in classification accuracy on two temporally continuous datasets. Similarly, the fine-tuned U-Net improves upon its baseline by 7.34% and 4.94% in cropping, and 8.02% and 5.52% in classification. Both models outperform the top-performing SAM models (ViT-Large and ViT-Base) by an average of 11.75% and 9.01% in cropping accuracy, and 2.93% and 4.83% in classification accuracy, respectively.
Paper Structure (21 sections, 2 equations, 5 figures, 2 tables)

This paper contains 21 sections, 2 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Different from segmentation workflows of (a) straightforward yet less-informed e.g. FCN long2015fully and U-Net ronneberger2015u, as well as (b) interactive and more-informed e.g. FocalClick chen2022focalclick, PseudoClick liu2022pseudoclick and SAM kirillov2023segment using prompts (d), our augmentation-based model re-adaptation framework (c), embraces the concept of pseudo re-adaptation (Sec. \ref{['sec:pr']}) by interacting with the currently trained model as depicted in (e), minimising the number of candidates in the augmentation pool to be adopted in the training phase with only the earliest released dataset, while maximising the accuracies of both segmentation and classification accuracy.
  • Figure 2: Sample images of the bottom side of commercial products produced by two factories, $F_1$ and $F_2$, in three temporally continuous datasets, i.e.$\mathcal{D}_{t_1}$, $\mathcal{D}_{t_2}$ and $\mathcal{D}_{t_3}$. The ACF code is the region of interest required to be segmented and cropped.
  • Figure 3: Workflow of the proposed AMRF for robust image segmentation in industrial inspection. The figure highlights the enhanced augmentation pool, where candidate augmentation methods are included based on their ability to improve cropping and classification accuracies over the current baseline. Cropping accuracy is evaluated by segmenting the region of interest and adopting an angle-adaptive cropping component to align the ACF code horizontally (described in Sec. \ref{['sec:angle']}). The flame symbol highlights the trainable FCN model (trained using $\mathcal{D}_{t_1}^{\{F_{1},F_{2}\}}$ and the given GT binary masks), whereas the snowflake symbol denotes the SAM model with frozen weights. In contrast, classification accuracy is assessed using an existing code classifier. A key component of AMRF, pseudo re-adaptation, modifies the input image to align with the current model's perception range to inform the gradual expansion of the augmentation pool. $z$, $g$, $c$ and $r$ refer to the variable settings for the augmentation methods.
  • Figure 4: Detailed comparisons of segmentation and cropping among pre-trained SAM and FCN baseline where our proposed ARMF framework is applied.
  • Figure 5: Cropping performance yielded by conventional (in conjunction with the proposed ARMF) and the latest prompt-based segmentation networks versus their associated classification of the cropped ACF codes using a pre-trained ResNet34 on temporally continuous datasets $\mathcal{D}_{t_2}$ and $\mathcal{D}_{t_3}$ introduced in Sec. \ref{['sec:ds']}. The radius of the bubbles denotes the number of trainable parameters in the model (U-Net: 35.37M, FCN: 51.94M, ViT-B: 93.74M, ViT-L: 312.34M, ViT-H: 641.09M). The colour denotes whether the model was a baseline model (grey), a model fine-tuned on our dataset (green) or a model given manually annotated references (purple).