Table of Contents
Fetching ...

CFR-ICL: Cascade-Forward Refinement with Iterative Click Loss for Interactive Image Segmentation

Shoukun Sun, Min Xian, Fei Xu, Luca Capriotti, Tiankai Yao

TL;DR

This paper tackles the inefficiency of click-based interactive image segmentation by introducing a three-component framework: Iterative Click Loss (ICL) to penalize excessive user clicks during training, Cascade-Forward Refinement (CFR) to perform unified coarse-to-fine refinement during inference without extra networks, and SUEM Copy-Paste augmentation to expand training data diversity. CFR uses a two-loop inference with an outer coarse prediction and an inner refinement loop, while ICL accumulates per-step losses as $L_{ICL} = \sum^t_{i=1} \beta_i \mathbb{L}(Y^i, \mathbb{Y})$ to bias the model toward fewer interactions. SUEM C&P creates varied synthetic training scenarios across four copy-paste modes to strengthen robustness. Empirical results on five public datasets show state-of-the-art performance and substantial reductions in the required number of clicks to reach high IoU thresholds (e.g., NoC@95 reductions up to about 30%), with demonstrated compatibility across backbones like ViT-H and HRNet. The approach offers practical impact by lowering user effort in interactive segmentation and providing a general framework adaptable to other iterative, mask-guided methods.

Abstract

The click-based interactive segmentation aims to extract the object of interest from an image with the guidance of user clicks. Recent work has achieved great overall performance by employing feedback from the output. However, in most state-of-the-art approaches, 1) the inference stage involves inflexible heuristic rules and requires a separate refinement model, and 2) the number of user clicks and model performance cannot be balanced. To address the challenges, we propose a click-based and mask-guided interactive image segmentation framework containing three novel components: Cascade-Forward Refinement (CFR), Iterative Click Loss (ICL), and SUEM image augmentation. The CFR offers a unified inference framework to generate segmentation results in a coarse-to-fine manner. The proposed ICL allows model training to improve segmentation and reduce user interactions simultaneously. The proposed SUEM augmentation is a comprehensive way to create large and diverse training sets for interactive image segmentation. Extensive experiments demonstrate the state-of-the-art performance of the proposed approach on five public datasets. Remarkably, our model reduces by 33.2\%, and 15.5\% the number of clicks required to surpass an IoU of 0.95 in the previous state-of-the-art approach on the Berkeley and DAVIS sets, respectively.

CFR-ICL: Cascade-Forward Refinement with Iterative Click Loss for Interactive Image Segmentation

TL;DR

This paper tackles the inefficiency of click-based interactive image segmentation by introducing a three-component framework: Iterative Click Loss (ICL) to penalize excessive user clicks during training, Cascade-Forward Refinement (CFR) to perform unified coarse-to-fine refinement during inference without extra networks, and SUEM Copy-Paste augmentation to expand training data diversity. CFR uses a two-loop inference with an outer coarse prediction and an inner refinement loop, while ICL accumulates per-step losses as to bias the model toward fewer interactions. SUEM C&P creates varied synthetic training scenarios across four copy-paste modes to strengthen robustness. Empirical results on five public datasets show state-of-the-art performance and substantial reductions in the required number of clicks to reach high IoU thresholds (e.g., NoC@95 reductions up to about 30%), with demonstrated compatibility across backbones like ViT-H and HRNet. The approach offers practical impact by lowering user effort in interactive segmentation and providing a general framework adaptable to other iterative, mask-guided methods.

Abstract

The click-based interactive segmentation aims to extract the object of interest from an image with the guidance of user clicks. Recent work has achieved great overall performance by employing feedback from the output. However, in most state-of-the-art approaches, 1) the inference stage involves inflexible heuristic rules and requires a separate refinement model, and 2) the number of user clicks and model performance cannot be balanced. To address the challenges, we propose a click-based and mask-guided interactive image segmentation framework containing three novel components: Cascade-Forward Refinement (CFR), Iterative Click Loss (ICL), and SUEM image augmentation. The CFR offers a unified inference framework to generate segmentation results in a coarse-to-fine manner. The proposed ICL allows model training to improve segmentation and reduce user interactions simultaneously. The proposed SUEM augmentation is a comprehensive way to create large and diverse training sets for interactive image segmentation. Extensive experiments demonstrate the state-of-the-art performance of the proposed approach on five public datasets. Remarkably, our model reduces by 33.2\%, and 15.5\% the number of clicks required to surpass an IoU of 0.95 in the previous state-of-the-art approach on the Berkeley and DAVIS sets, respectively.
Paper Structure (12 sections, 5 equations, 4 figures, 4 tables)

This paper contains 12 sections, 5 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Examples of segmentation results that exceed an IoU of 0.95. The first column shows images with clicks (green for the foreground and red for the background) and segmentation masks from the proposed approach (blue). The second column shows the ground truth. The third column shows probability maps of the proposed approach. These raw images are from the Berkeley martinDatabaseHumanSegmented2001 and DAVIS perazziBenchmarkDatasetEvaluation2016 sets.
  • Figure 2: Overview of iterative mask-guided interactive segmentation integrated with Cascade-Forward Refinement. The orange colored lines represent the user interaction loop (outer loop). The green colored line represents the Refinement loop (inner loop). The black colored lines are shared processes for both loops. New clicks are added by the user in the user interaction loop. In the CFR loop, the previous mask is iteratively optimized with clicks.
  • Figure 3: Sample results of Cascade-Forward Refinement.
  • Figure 4: Illustration of copy-paste modes.