Localizing Objects with Self-Supervised Transformers and no Labels

Oriane Siméoni; Gilles Puy; Huy V. Vo; Simon Roburin; Spyros Gidaris; Andrei Bursuc; Patrick Pérez; Renaud Marlet; Jean Ponce

Localizing Objects with Self-Supervised Transformers and no Labels

Oriane Siméoni, Gilles Puy, Huy V. Vo, Simon Roburin, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Renaud Marlet, Jean Ponce

TL;DR

LOST addresses unsupervised object localization by exploiting patch-level keys from a self-supervised vision transformer (DINO). It localizes a seed patch with minimal cross-patch correlations, expands to related patches, and extracts a bounding box, all without any labeled data. The method yields state-of-the-art CorLoc on VOC07/12, enables unsupervised class-agnostic and class-aware detectors trained purely on pseudo-labels, and demonstrates competitive unsupervised detection results across multiple datasets. By operating on a single image with linear complexity, LOST offers a scalable approach to object localization that can bootstrap downstream unsupervised detection and categorization efforts in real-world pipelines.

Abstract

Localizing objects in image collections without supervision can help to avoid expensive annotation campaigns. We propose a simple approach to this problem, that leverages the activation features of a vision transformer pre-trained in a self-supervised manner. Our method, LOST, does not require any external object proposal nor any exploration of the image collection; it operates on a single image. Yet, we outperform state-of-the-art object discovery methods by up to 8 CorLoc points on PASCAL VOC 2012. We also show that training a class-agnostic detector on the discovered objects boosts results by another 7 points. Moreover, we show promising results on the unsupervised object discovery task. The code to reproduce our results can be found at https://github.com/valeoai/LOST.

Localizing Objects with Self-Supervised Transformers and no Labels

TL;DR

Abstract

Localizing Objects with Self-Supervised Transformers and no Labels

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)