Table of Contents
Fetching ...

RefCut: Interactive Segmentation with Reference Guidance

Zheng Lin, Nan Zhou, Chen-Xi Du, Deng-Ping Fan, Shi-Min Hu

TL;DR

RefCut tackles interactive segmentation ambiguity by introducing a reference-guided prompt mechanism that uses a reference image and masks to steer segmentation of a target image. The architecture merges a reference branch with a target interactive branch, producing prompts P_r^+ and P_r^- via a Reference Prompt Generator to align outputs with reference semantics and granularity. The authors also introduce the Target Disassembly Dataset (TDA) to benchmark part- and object-level ambiguities, and report state-of-the-art performance on PartImageNet, PASCAL-Part, and TDA across single-part, multi-part, and whole-object evaluations. Overall, RefCut reduces the interactive burden for large-scale, target-specific annotation by leveraging intuitive reference guidance, with code and demonstrations to follow.

Abstract

Interactive segmentation aims to segment the specified target on the image with positive and negative clicks from users. Interactive ambiguity is a crucial issue in this field, which refers to the possibility of multiple compliant outcomes with the same clicks, such as selecting a part of an object versus the entire object, a single object versus a combination of multiple objects, and so on. The existing methods cannot provide intuitive guidance to the model, which leads to unstable output results and makes it difficult to meet the large-scale and efficient annotation requirements for specific targets in some scenarios. To bridge this gap, we introduce RefCut, a reference-based interactive segmentation framework designed to address part ambiguity and object ambiguity in segmenting specific targets. Users only need to provide a reference image and corresponding reference masks, and the model will be optimized based on them, which greatly reduces the interactive burden on users when annotating a large number of such targets. In addition, to enrich these two kinds of ambiguous data, we propose a new Target Disassembly Dataset which contains two subsets of part disassembly and object disassembly for evaluation. In the combination evaluation of multiple datasets, our RefCut achieved state-of-the-art performance. Extensive experiments and visualized results demonstrate that RefCut advances the field of intuitive and controllable interactive segmentation. Our code will be publicly available and the demo video is in https://www.lin-zheng.com/refcut.

RefCut: Interactive Segmentation with Reference Guidance

TL;DR

RefCut tackles interactive segmentation ambiguity by introducing a reference-guided prompt mechanism that uses a reference image and masks to steer segmentation of a target image. The architecture merges a reference branch with a target interactive branch, producing prompts P_r^+ and P_r^- via a Reference Prompt Generator to align outputs with reference semantics and granularity. The authors also introduce the Target Disassembly Dataset (TDA) to benchmark part- and object-level ambiguities, and report state-of-the-art performance on PartImageNet, PASCAL-Part, and TDA across single-part, multi-part, and whole-object evaluations. Overall, RefCut reduces the interactive burden for large-scale, target-specific annotation by leveraging intuitive reference guidance, with code and demonstrations to follow.

Abstract

Interactive segmentation aims to segment the specified target on the image with positive and negative clicks from users. Interactive ambiguity is a crucial issue in this field, which refers to the possibility of multiple compliant outcomes with the same clicks, such as selecting a part of an object versus the entire object, a single object versus a combination of multiple objects, and so on. The existing methods cannot provide intuitive guidance to the model, which leads to unstable output results and makes it difficult to meet the large-scale and efficient annotation requirements for specific targets in some scenarios. To bridge this gap, we introduce RefCut, a reference-based interactive segmentation framework designed to address part ambiguity and object ambiguity in segmenting specific targets. Users only need to provide a reference image and corresponding reference masks, and the model will be optimized based on them, which greatly reduces the interactive burden on users when annotating a large number of such targets. In addition, to enrich these two kinds of ambiguous data, we propose a new Target Disassembly Dataset which contains two subsets of part disassembly and object disassembly for evaluation. In the combination evaluation of multiple datasets, our RefCut achieved state-of-the-art performance. Extensive experiments and visualized results demonstrate that RefCut advances the field of intuitive and controllable interactive segmentation. Our code will be publicly available and the demo video is in https://www.lin-zheng.com/refcut.

Paper Structure

This paper contains 15 sections, 7 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison between our RefCut and other methods for Interactive Segmentation (IS). (a) The general IS methods focus on the object level; (b) The granularity-based IS methods need selecting targets or inputting granularity values, which is universal but lacks certain stability; (c) Our reference-based IS method can provide more intuitive information to efficiently resolve part ambiguity and object ambiguity under specific requirements.
  • Figure 2: The framework of RefCut. The upper branch shows the extraction process of the reference prompts from the reference image and masks, and the lower branch shows the classic interactive segmentation pipeline from the target image and user clicks.
  • Figure 3: The examples of Target Disassembly Dataset with subsets for part disassembly (TDA-PD) and object disassembly (TDA-OD).
  • Figure 4: The visualized results. The yellow contours in the reference and target images represent the reference masks and segmentation results, respectively. The baseline results are similar to those obtained using a single object as a reference (4th, 6th, 7th cloumns).
  • Figure 5: The ablation study for the reference mask quality and target size on TDA subsets with RefCut based on SBD hariharan2011sbd. P and N mean adopting the positive and negative reference, respectively.