Table of Contents
Fetching ...

ClickAttention: Click Region Similarity Guided Interactive Segmentation

Long Xu, Shanghong Li, Yongquan Chen, Junkang Chen, Rui Huang, Feng Wu

TL;DR

This work targets the inefficiency of traditional click-based interactive segmentation by expanding the influence of positive user clicks through local-region similarity and a patch-based attention mechanism. It introduces ClickAttention, supported by a nonlinear mapper $\phi$ and a discriminative affinity loss to decouple positive/negative click regions, integrated with a Segformer backbone. Empirical results show state-of-the-art or competitive performance on multiple benchmarks with significantly fewer parameters and improved efficiency, outperforming large models like SAM/HQ-SAM in many settings. The approach enables accurate segmentation with fewer, more informative clicks, offering practical value for low-resource devices and large-scale annotation tasks.

Abstract

Interactive segmentation algorithms based on click points have garnered significant attention from researchers in recent years. However, existing studies typically use sparse click maps as model inputs to segment specific target objects, which primarily affect local regions and have limited abilities to focus on the whole target object, leading to increased times of clicks. In addition, most existing algorithms can not balance well between high performance and efficiency. To address this issue, we propose a click attention algorithm that expands the influence range of positive clicks based on the similarity between positively-clicked regions and the whole input. We also propose a discriminative affinity loss to reduce the attention coupling between positive and negative click regions to avoid an accuracy decrease caused by mutual interference between positive and negative clicks. Extensive experiments demonstrate that our approach is superior to existing methods and achieves cutting-edge performance in fewer parameters. An interactive demo and all reproducible codes will be released at https://github.com/hahamyt/ClickAttention.

ClickAttention: Click Region Similarity Guided Interactive Segmentation

TL;DR

This work targets the inefficiency of traditional click-based interactive segmentation by expanding the influence of positive user clicks through local-region similarity and a patch-based attention mechanism. It introduces ClickAttention, supported by a nonlinear mapper and a discriminative affinity loss to decouple positive/negative click regions, integrated with a Segformer backbone. Empirical results show state-of-the-art or competitive performance on multiple benchmarks with significantly fewer parameters and improved efficiency, outperforming large models like SAM/HQ-SAM in many settings. The approach enables accurate segmentation with fewer, more informative clicks, offering practical value for low-resource devices and large-scale annotation tasks.

Abstract

Interactive segmentation algorithms based on click points have garnered significant attention from researchers in recent years. However, existing studies typically use sparse click maps as model inputs to segment specific target objects, which primarily affect local regions and have limited abilities to focus on the whole target object, leading to increased times of clicks. In addition, most existing algorithms can not balance well between high performance and efficiency. To address this issue, we propose a click attention algorithm that expands the influence range of positive clicks based on the similarity between positively-clicked regions and the whole input. We also propose a discriminative affinity loss to reduce the attention coupling between positive and negative click regions to avoid an accuracy decrease caused by mutual interference between positive and negative clicks. Extensive experiments demonstrate that our approach is superior to existing methods and achieves cutting-edge performance in fewer parameters. An interactive demo and all reproducible codes will be released at https://github.com/hahamyt/ClickAttention.
Paper Structure (15 sections, 8 equations, 9 figures, 4 tables)

This paper contains 15 sections, 8 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Attention visualization of clicked regions based on Transformer dosovitskiy2020image9878519liu2022simpleclick. Red marks denote the negative click; Green marks indicate the positive click; Black circles indicate the limited click influence range issue; Red rectangles highlight the regions that should not be activated in the attention maps of positively/negatively-clicked regions.
  • Figure 2: The average IOU varies with clicks (based on the average results of 8 benchmarks), indicating the proposed method can utilize fewer clicks to obtain better precision
  • Figure 3: Overall framework of the proposed algorithm. Four downsampling blocks come from Segformer NEURIPS2021_64f1f27b. Click attention block is adopted for both inference and training; Discriminative affinity loss is adopted for training
  • Figure 4: Click attention calculation based on patches similarity
  • Figure 5: Illustration of discriminative affinity loss calculation based on click attention
  • ...and 4 more figures