Table of Contents
Fetching ...

GraCo: Granularity-Controllable Interactive Segmentation

Yian Zhao, Kehan Li, Zesen Cheng, Pengchong Qiao, Xiawu Zheng, Rongrong Ji, Chang Liu, Li Yuan, Jie Chen

TL;DR

GraCo addresses spatial ambiguity in interactive segmentation by enabling explicit granularity control through a granularity input parameter. It introduces an automated Any-Granularity Mask Generator (AGG) to produce abundant mask-granularity pairs and a Granularity-Controllable Learning (GCL) strategy that injects granularity prompts via discrete embeddings and adapter-based LoRA into a pre-trained IS model. The method achieves state-of-the-art performance on object and part benchmarks, often surpassing multi-granularity approaches like SAM, while remaining low-cost and flexible for diverse segmentation tasks. This work also positions GraCo as a practical annotation tool capable of adapting to a wide range of granularities with minimal manual effort. Overall, GraCo demonstrates robust granularity controllability aligned with human cognition and strong generalization across benchmarks.

Abstract

Interactive Segmentation (IS) segments specific objects or parts in the image according to user input. Current IS pipelines fall into two categories: single-granularity output and multi-granularity output. The latter aims to alleviate the spatial ambiguity present in the former. However, the multi-granularity output pipeline suffers from limited interaction flexibility and produces redundant results. In this work, we introduce Granularity-Controllable Interactive Segmentation (GraCo), a novel approach that allows precise control of prediction granularity by introducing additional parameters to input. This enhances the customization of the interactive system and eliminates redundancy while resolving ambiguity. Nevertheless, the exorbitant cost of annotating multi-granularity masks and the lack of available datasets with granularity annotations make it difficult for models to acquire the necessary guidance to control output granularity. To address this problem, we design an any-granularity mask generator that exploits the semantic property of the pre-trained IS model to automatically generate abundant mask-granularity pairs without requiring additional manual annotation. Based on these pairs, we propose a granularity-controllable learning strategy that efficiently imparts the granularity controllability to the IS model. Extensive experiments on intricate scenarios at object and part levels demonstrate that our GraCo has significant advantages over previous methods. This highlights the potential of GraCo to be a flexible annotation tool, capable of adapting to diverse segmentation scenarios. The project page: https://zhao-yian.github.io/GraCo.

GraCo: Granularity-Controllable Interactive Segmentation

TL;DR

GraCo addresses spatial ambiguity in interactive segmentation by enabling explicit granularity control through a granularity input parameter. It introduces an automated Any-Granularity Mask Generator (AGG) to produce abundant mask-granularity pairs and a Granularity-Controllable Learning (GCL) strategy that injects granularity prompts via discrete embeddings and adapter-based LoRA into a pre-trained IS model. The method achieves state-of-the-art performance on object and part benchmarks, often surpassing multi-granularity approaches like SAM, while remaining low-cost and flexible for diverse segmentation tasks. This work also positions GraCo as a practical annotation tool capable of adapting to a wide range of granularities with minimal manual effort. Overall, GraCo demonstrates robust granularity controllability aligned with human cognition and strong generalization across benchmarks.

Abstract

Interactive Segmentation (IS) segments specific objects or parts in the image according to user input. Current IS pipelines fall into two categories: single-granularity output and multi-granularity output. The latter aims to alleviate the spatial ambiguity present in the former. However, the multi-granularity output pipeline suffers from limited interaction flexibility and produces redundant results. In this work, we introduce Granularity-Controllable Interactive Segmentation (GraCo), a novel approach that allows precise control of prediction granularity by introducing additional parameters to input. This enhances the customization of the interactive system and eliminates redundancy while resolving ambiguity. Nevertheless, the exorbitant cost of annotating multi-granularity masks and the lack of available datasets with granularity annotations make it difficult for models to acquire the necessary guidance to control output granularity. To address this problem, we design an any-granularity mask generator that exploits the semantic property of the pre-trained IS model to automatically generate abundant mask-granularity pairs without requiring additional manual annotation. Based on these pairs, we propose a granularity-controllable learning strategy that efficiently imparts the granularity controllability to the IS model. Extensive experiments on intricate scenarios at object and part levels demonstrate that our GraCo has significant advantages over previous methods. This highlights the potential of GraCo to be a flexible annotation tool, capable of adapting to diverse segmentation scenarios. The project page: https://zhao-yian.github.io/GraCo.
Paper Structure (17 sections, 7 equations, 8 figures, 7 tables)

This paper contains 17 sections, 7 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: (a): Single-granularity IS ignores spatial ambiguity. (b): Multi-granularity IS is limited in the number of outputs and produces redundant results. (c): Our Granularity-Controllable IS allows precise control of output granularity to match user expectations by attaching additional parameters to the input.
  • Figure 2: Illustration of our granularity-controllable interactive segmentation. Our GraCo consists of two stages. For the first stage, the Any-Granularity mask Generator (AGG) is designed to automatically generate any-granularity proposals (mask engine) and granularity annotations (granularity estimator) based on the object GT, without requiring additional manual annotation. For the second stage, the mask-granularity pairs generated by AGG are utilized to perform Granularity-Controllable Learning (GCL) on the object-level pre-trained IS model, enabling the model to efficiently possesses granularity controllability.
  • Figure 3: Illustration of the multi-granularity loop simulation and visualization of the mask proposals generated by AGG.
  • Figure 4: Verification of the granularity controllability. We calculate IoU@k under different granularities to plot IoU-granularity curves. The optimal granularity (marked by the red star) of the objects is about 1.0, while for the parts of the cow from PascalPart chen2014detect it is different.
  • Figure 5: Visualization of interactive segmentation on part GT using SimpleClickliu2023simpleclickand our GraCo. We note the input granularity for our GraCo, which is roughly estimated based on human cognition.
  • ...and 3 more figures