Table of Contents
Fetching ...

NubbleDrop: A Simple Way to Improve Matching Strategy for Prompted One-Shot Segmentation

Zhiyu Xu, Qingliang Chen

TL;DR

This work analyzes two core weaknesses in prompt-based, SAM-driven one-shot segmentation: that patch similarity derived from raw features is distorted by complex feature interactions, and that channel-value distributions are uneven, giving dominance to a few channels. It proposes NubbleDrop, a training-free method that randomly drops feature channels during matching to mitigate deceptive channels with negligible overhead. Across COCO-20i, LVIS-92i, FSS-1000, and PASCAL-Part, and over multiple vision foundation models, MN (Matcher with NubbleDrop) achieves notable gains (e.g., 53.5 mIoU on COCO-20i and 34.0% on LVIS-92i) and demonstrates strong cross-backbone improvements, underscoring the method’s robustness and transferability. The results imply that simple, low-cost channel perturbations can meaningfully improve prompting-based segmentation when facing imperfect feature representations, with potential applicability to a broader set of similarity computing tasks.

Abstract

Driven by large data trained segmentation models, such as SAM , research in one-shot segmentation has experienced significant advancements. Recent contributions like PerSAM and MATCHER , presented at ICLR 2024, utilize a similar approach by leveraging SAM with one or a few reference images to generate high quality segmentation masks for target images. Specifically, they utilize raw encoded features to compute cosine similarity between patches within reference and target images along the channel dimension, effectively generating prompt points or boxes for the target images a technique referred to as the matching strategy. However, relying solely on raw features might introduce biases and lack robustness for such a complex task. To address this concern, we delve into the issues of feature interaction and uneven distribution inherent in raw feature based matching. In this paper, we propose a simple and training-free method to enhance the validity and robustness of the matching strategy at no additional computational cost (NubbleDrop). The core concept involves randomly dropping feature channels (setting them to zero) during the matching process, thereby preventing models from being influenced by channels containing deceptive information. This technique mimics discarding pathological nubbles, and it can be seamlessly applied to other similarity computing scenarios. We conduct a comprehensive set of experiments, considering a wide range of factors, to demonstrate the effectiveness and validity of our proposed method. Our results showcase the significant improvements achieved through this simmple and straightforward approach.

NubbleDrop: A Simple Way to Improve Matching Strategy for Prompted One-Shot Segmentation

TL;DR

This work analyzes two core weaknesses in prompt-based, SAM-driven one-shot segmentation: that patch similarity derived from raw features is distorted by complex feature interactions, and that channel-value distributions are uneven, giving dominance to a few channels. It proposes NubbleDrop, a training-free method that randomly drops feature channels during matching to mitigate deceptive channels with negligible overhead. Across COCO-20i, LVIS-92i, FSS-1000, and PASCAL-Part, and over multiple vision foundation models, MN (Matcher with NubbleDrop) achieves notable gains (e.g., 53.5 mIoU on COCO-20i and 34.0% on LVIS-92i) and demonstrates strong cross-backbone improvements, underscoring the method’s robustness and transferability. The results imply that simple, low-cost channel perturbations can meaningfully improve prompting-based segmentation when facing imperfect feature representations, with potential applicability to a broader set of similarity computing tasks.

Abstract

Driven by large data trained segmentation models, such as SAM , research in one-shot segmentation has experienced significant advancements. Recent contributions like PerSAM and MATCHER , presented at ICLR 2024, utilize a similar approach by leveraging SAM with one or a few reference images to generate high quality segmentation masks for target images. Specifically, they utilize raw encoded features to compute cosine similarity between patches within reference and target images along the channel dimension, effectively generating prompt points or boxes for the target images a technique referred to as the matching strategy. However, relying solely on raw features might introduce biases and lack robustness for such a complex task. To address this concern, we delve into the issues of feature interaction and uneven distribution inherent in raw feature based matching. In this paper, we propose a simple and training-free method to enhance the validity and robustness of the matching strategy at no additional computational cost (NubbleDrop). The core concept involves randomly dropping feature channels (setting them to zero) during the matching process, thereby preventing models from being influenced by channels containing deceptive information. This technique mimics discarding pathological nubbles, and it can be seamlessly applied to other similarity computing scenarios. We conduct a comprehensive set of experiments, considering a wide range of factors, to demonstrate the effectiveness and validity of our proposed method. Our results showcase the significant improvements achieved through this simmple and straightforward approach.
Paper Structure (13 sections, 9 equations, 7 figures, 3 tables)

This paper contains 13 sections, 9 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Numbers of reference images with mismatch in the randomly selected 500 pictures of COCO-20$^{i}$. On the x-axis, 1, 2, 3, 4, 5 represent dividing 500 into 5 parts for analysis, with 100 images each. 6 represents the average.
  • Figure 2: The y-axis represents the number of images from a random selection of 500 images from COCO-20$^i$ dataset, where the maximum channel value in the the normalized features encoded by DINOv2 exceeds the number indicated on the x-axis.
  • Figure 3: The y-axis represents the number of images from a random selection of 1000 images from COCO-20$^i$ dataset, where the variance of the the normalized features encoded by DINOv2 falls short of the number indicated on the x-axis.
  • Figure 4: Illustration of NubbleDrop. Note that 'X' signifies that the channel is dropped, i.e., set to 0.
  • Figure 5: Examples of PASCAL-Part dataset.
  • ...and 2 more figures