Table of Contents
Fetching ...

CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers

Shahaf Arica, Or Rubin, Sapir Gershov, Shlomi Laufer

TL;DR

Vote Cut is introduced, an innovative method for unsupervised object discovery that leverages feature representations from multiple self-supervised models and CuVLER (Cut-Vote-and-LEaRn), a zero-shot model, trained using pseudo-labels, generated by Vote Cut and a novel soft target loss to refine segmentation accuracy.

Abstract

In this paper, we introduce VoteCut, an innovative method for unsupervised object discovery that leverages feature representations from multiple self-supervised models. VoteCut employs normalized-cut based graph partitioning, clustering and a pixel voting approach. Additionally, We present CuVLER (Cut-Vote-and-LEaRn), a zero-shot model, trained using pseudo-labels, generated by VoteCut, and a novel soft target loss to refine segmentation accuracy. Through rigorous evaluations across multiple datasets and several unsupervised setups, our methods demonstrate significant improvements in comparison to previous state-of-the-art models. Our ablation studies further highlight the contributions of each component, revealing the robustness and efficacy of our approach. Collectively, VoteCut and CuVLER pave the way for future advancements in image segmentation.

CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers

TL;DR

Vote Cut is introduced, an innovative method for unsupervised object discovery that leverages feature representations from multiple self-supervised models and CuVLER (Cut-Vote-and-LEaRn), a zero-shot model, trained using pseudo-labels, generated by Vote Cut and a novel soft target loss to refine segmentation accuracy.

Abstract

In this paper, we introduce VoteCut, an innovative method for unsupervised object discovery that leverages feature representations from multiple self-supervised models. VoteCut employs normalized-cut based graph partitioning, clustering and a pixel voting approach. Additionally, We present CuVLER (Cut-Vote-and-LEaRn), a zero-shot model, trained using pseudo-labels, generated by VoteCut, and a novel soft target loss to refine segmentation accuracy. Through rigorous evaluations across multiple datasets and several unsupervised setups, our methods demonstrate significant improvements in comparison to previous state-of-the-art models. Our ablation studies further highlight the contributions of each component, revealing the robustness and efficacy of our approach. Collectively, VoteCut and CuVLER pave the way for future advancements in image segmentation.
Paper Structure (19 sections, 8 equations, 8 figures, 9 tables)

This paper contains 19 sections, 8 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: (a) An illustrated overview of the VoteCut workflow. A set of models initially makes inferences on the input image, producing feature representations for individual patches. Subsequently, Normalized Cuts (NCut) are performed following the methodology in wang2022tokencut, yielding the second smallest eigenvectors from each model. Multiple segment proposals are generated by applying 1D K-means clustering to these eigenvectors with varying K values. The final stage of VoteCut involves clustering these proposals and extracting definitive masks from each cluster via voting. Each definitive mask is also associated with a score. (b) The "Clustering & Voting" stage of VoteCut is detailed. First, segments are clustered using an Intersection over Union (IoU) threshold, determining segment membership within clusters. A voting mechanism is employed within each cluster to decide whether each patch should be included in the segment. Lastly, a Conditional Random Field (CRF) krahenbuhl2011efficient is applied to refine the mask at a finer level. The cluster size determines the score assigned to each mask, as elucidated in \ref{['eq:score']}.
  • Figure 2: Visual illustration of VoteCut performance vs. SOTA NCut based object-discovery methods on the ImageNet validation set. The VoteCut bounding box score is calculated according to \ref{['eq:score']}
  • Figure 3: In-domain evaluation of the VoteCut method, without CAD training, with varying $\tau^m$ on the ImageNet validation set.
  • Figure 4: Results of the VoteCut method without CAD training in an in-domain configuration with different $k_{max}$ values on the ImageNet validation set.
  • Figure 5: Model count ablation test. The results are obtained in an in-domain setup on the ImageNet validation set using the VoteCut method without CAD training.
  • ...and 3 more figures