SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation

Danni Yang; Jiayi Ji; Yiwei Ma; Tianyu Guo; Haowei Wang; Xiaoshuai Sun; Rongrong Ji

SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation

Danni Yang, Jiayi Ji, Yiwei Ma, Tianyu Guo, Haowei Wang, Xiaoshuai Sun, Rongrong Ji

TL;DR

This paper tackles the high labeling cost of referring expression segmentation (RES) by introducing SemiRES, a semi-supervised framework that leverages the Segment Anything Model (SAM) to refine noisy pseudo-labels. It deploys two SAM-based matching strategies, IoU-based Optimal Matching (IOM) and Composite Parts Integration (CPI), along with a Pixel-Wise Adjustment (PWA) to handle unmatched cases, all within a teacher-student training paradigm. Empirical results on RefCOCO, RefCOCO+, and G-Ref show that SemiRES consistently outperforms fully supervised and baseline semi-supervised approaches, including substantial gains at very low labeled data fractions (e.g., 1%). The work reduces labeling costs while delivering robust RES performance, highlighting the practical value of SAM-driven pseudo-label refinement for vision-language tasks.

Abstract

In this paper, we introduce SemiRES, a semi-supervised framework that effectively leverages a combination of labeled and unlabeled data to perform RES. A significant hurdle in applying semi-supervised techniques to RES is the prevalence of noisy pseudo-labels, particularly at the boundaries of objects. SemiRES incorporates the Segment Anything Model (SAM), renowned for its precise boundary demarcation, to improve the accuracy of these pseudo-labels. Within SemiRES, we offer two alternative matching strategies: IoU-based Optimal Matching (IOM) and Composite Parts Integration (CPI). These strategies are designed to extract the most accurate masks from SAM's output, thus guiding the training of the student model with enhanced precision. In instances where a precise mask cannot be matched from the available candidates, we develop the Pixel-Wise Adjustment (PWA) strategy, guiding the student model's training directly by the pseudo-labels. Extensive experiments on three RES benchmarks--RefCOCO, RefCOCO+, and G-Ref reveal its superior performance compared to fully supervised methods. Remarkably, with only 1% labeled data, our SemiRES outperforms the supervised baseline by a large margin, e.g. +18.64% gains on RefCOCO val set. The project code is available at \url{https://github.com/nini0919/SemiRES}.

SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation

TL;DR

Abstract

Paper Structure (28 sections, 10 equations, 9 figures, 9 tables, 1 algorithm)

This paper contains 28 sections, 10 equations, 9 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Referring Expression Segmentation
Semi-Supervised Semantic Segmentation
Segment Anything Model
Method
Task Definition
Semi-Supervised Baseline
The Proposed SemiRES
Overview
SAM-based Pseudo-Label Refinement
Pixel-Wise Weighted Adjustment
Experiment
Datasets
Implementation Details
...and 13 more sections

Figures (9)

Figure 1: (a) A large number of noisy and incomplete cases exist in pseudo-labels. The proposed SemiRES can address this issue. (b) Analysis shows labeling a small portion of RefCOCO data can greatly reduce costs. (c) Our method substantially improves performance, even with a small number of annotated samples.
Figure 2: An overview of the proposed SemiRES, featuring a teacher-student network with data augmentation and mutual learning. It includes SAM-based pseudo-label refinement using IOM or CPI strategies, and PWA supervision when matches are not found.
Figure 3: Visualization of the principles behind IOM and CPI addressing pseudo-label issues in different cases.
Figure 4: Qualitative analysis for SemiRES, supervised model, semi-supervised baseline and ground truth. The white number in the bottom right corner represents the IoU value between predicted image and ground truth. The object enclosed by the red dashed box represents incorrect segmentation. Here we use supervised and semi-supervised model trained on 1% labeled data for visualization.
Figure 5: Demonstration of how IOM and CPI operate in matching pseudo-labels. The caption for this image is "kid looking at you".
...and 4 more figures

SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation

TL;DR

Abstract

SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)