Table of Contents
Fetching ...

Cycle-Consistency Uncertainty Estimation for Visual Prompting based One-Shot Defect Segmentation

Geonuk Kim

TL;DR

To address the adaptability gap in industrial defect detection, the paper combines visual prompting with cycle-consistency uncertainty estimation to validate prompt–query relationships. It conducts forward and reverse prompting and gates predictions with a confidence score $p_c = p_f × p_r × mIoU(m_s, m_r)$, enabling reliable one-shot defect segmentation. The approach achieves a yield rate of $0.9175$ on the VISION24 dataset without ensembles, reducing false positives and enhancing robustness. This work offers a practical pathway for scalable, adaptable defect detection in dynamic industrial environments.

Abstract

Industrial defect detection traditionally relies on supervised learning models trained on fixed datasets of known defect types. While effective within a closed set, these models struggle with new, unseen defects, necessitating frequent re-labeling and re-training. Recent advances in visual prompting offer a solution by allowing models to adaptively infer novel categories based on provided visual cues. However, a prevalent issue in these methods is the over-confdence problem, where models can mis-classify unknown objects as known objects with high certainty. To addresssing the fundamental concerns about the adaptability, we propose a solution to estimate uncertainty of the visual prompting process by cycle-consistency. We designed to check whether it can accurately restore the original prompt from its predictions. To quantify this, we measure the mean Intersection over Union (mIoU) between the restored prompt mask and the originally provided prompt mask. Without using complex designs or ensemble methods with multiple networks, our approach achieved a yield rate of 0.9175 in the VISION24 one-shot industrial challenge.

Cycle-Consistency Uncertainty Estimation for Visual Prompting based One-Shot Defect Segmentation

TL;DR

To address the adaptability gap in industrial defect detection, the paper combines visual prompting with cycle-consistency uncertainty estimation to validate prompt–query relationships. It conducts forward and reverse prompting and gates predictions with a confidence score , enabling reliable one-shot defect segmentation. The approach achieves a yield rate of on the VISION24 dataset without ensembles, reducing false positives and enhancing robustness. This work offers a practical pathway for scalable, adaptable defect detection in dynamic industrial environments.

Abstract

Industrial defect detection traditionally relies on supervised learning models trained on fixed datasets of known defect types. While effective within a closed set, these models struggle with new, unseen defects, necessitating frequent re-labeling and re-training. Recent advances in visual prompting offer a solution by allowing models to adaptively infer novel categories based on provided visual cues. However, a prevalent issue in these methods is the over-confdence problem, where models can mis-classify unknown objects as known objects with high certainty. To addresssing the fundamental concerns about the adaptability, we propose a solution to estimate uncertainty of the visual prompting process by cycle-consistency. We designed to check whether it can accurately restore the original prompt from its predictions. To quantify this, we measure the mean Intersection over Union (mIoU) between the restored prompt mask and the originally provided prompt mask. Without using complex designs or ensemble methods with multiple networks, our approach achieved a yield rate of 0.9175 in the VISION24 one-shot industrial challenge.
Paper Structure (15 sections, 1 equation, 3 figures, 1 table)

This paper contains 15 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 1: Given a support image with its corresponding prompt mask ms and a query image, the goal of the forward phase is to identify the regions in the query image that correspond to the prompt. In the context of segmentation, this process results in the generation of a mask map $m_{f}$ and probability $p_{f}$ corresponding to the query image. In reverse phase, prompting inference is conducted in reverse. The query image and the generated mask $m_{f}$ are treated as the support image and support mask, respectively, while the original support image is considered as the query image. This approach allows for prompting inference to generate a mask $m_{r}$ and $p_{r}$ corresponding to the pseudo query image. Subsequently, the mIoU between the original support mask and the support mask predicted during the reverse phase is computed to quantify whether the model has made unbiased predictions in both the forward and reverse phases.
  • Figure 2: Examples of correct-yield samples corrected by Cycle Consistency-based uncertainty estimation. The red mask in the top left represents the support image and its corresponding ground truth mask map. The bottom left shows the query image. The green mask in the bottom right indicates the query mask inferred through the forward phase, while the blue mask in the top right represents the support mask restored through the reverse phase. In these samples, the support mask was not accurately restored due to model bias, and the $\textit{p}_\textit{c}$ score was lower than the pre-defined threshold, leading the model to convert predicted mask $\textit{m}_\textit{f}$ to null mask. In the case of the 'Cable' example, the $\textit{p}_\textit{f}$ value is 0.977, indicating that the model predicted the mask very confidently. However, the mIoU between the restored support mask and the ground truth was measured at 0.048, which is very low.
  • Figure 3: Examples of good-catch samples. The red mask in the top left represents the support image and its corresponding ground truth mask map. The bottom left shows the query image. The green mask in the bottom right indicates the query mask inferred through the forward phase, while the blue mask in the top right represents the support mask restored through the reverse phase. In these samples, the support mask was accurately restored with high mIoU, and the $\textit{p}_\textit{c}$ score was higher than the pre-defined threshold, leading the model to consider predicted $\textit{m}_\textit{f}$ as correct.