Boosting Unsupervised Segmentation Learning
Alp Eren Sari, Francesco Locatello, Paolo Favaro
TL;DR
This work tackles the limited resolution of unsupervised segmentation masks produced by state-of-the-art methods that rely on downsampled features. It introduces two practical techniques: guided filtering using the luminance channel as guidance to refine segmentation masks with negligible compute overhead, and a multi-scale consistency criterion implemented via a teacher-student framework with a cropping-based equivariance loss $L_{eq}$ and a stop-gradient mechanism to prevent mask collapse. The methods deliver SotA results on unsupervised saliency benchmarks DUT-OMRON, DUTS-TE, and ECSSD, and yield improvements in CorLoc for unsupervised single-object detection on VOC and COCO20K, while remaining backbones-agnostic. The approach is modular and easy to apply across diverse unsupervised segmentation methods, with code to be released and extensive ablations demonstrating guided filtering as a key driver of gains.
Abstract
We present two practical improvement techniques for unsupervised segmentation learning. These techniques address limitations in the resolution and accuracy of predicted segmentation maps of recent state-of-the-art methods. Firstly, we leverage image post-processing techniques such as guided filtering to refine the output masks, improving accuracy while avoiding substantial computational costs. Secondly, we introduce a multi-scale consistency criterion, based on a teacher-student training scheme. This criterion matches segmentation masks predicted from regions of the input image extracted at different resolutions to each other. Experimental results on several benchmarks used in unsupervised segmentation learning demonstrate the effectiveness of our proposed techniques.
