Table of Contents
Fetching ...

Order-aware Interactive Segmentation

Bin Wang, Anwesa Choudhuri, Meng Zheng, Zhongpai Gao, Benjamin Planche, Andong Deng, Qin Liu, Terrence Chen, Ulas Bagci, Ziyan Wu

TL;DR

Order-aware Interactive Segmentation (OIS) tackles interactive segmentation by injecting relative depth cues through order maps into order-aware attention and by enforcing explicit object-level discrimination via object-aware attention. It combines dense and sparse prompt integration to preserve spatial alignment while maintaining computational efficiency. The framework demonstrates state-of-the-art performance on HQSeg44K and DAVIS, achieving higher mIoU after minimal user input and significantly lower latency than prior methods. The work highlights the value of 3D-aware cues and explicit FG/BG separation for robust foreground segmentation in complex scenes, with strong practical impact for efficient image and video editing and annotation.

Abstract

Interactive segmentation aims to accurately segment target objects with minimal user interactions. However, current methods often fail to accurately separate target objects from the background, due to a limited understanding of order, the relative depth between objects in a scene. To address this issue, we propose OIS: order-aware interactive segmentation, where we explicitly encode the relative depth between objects into order maps. We introduce a novel order-aware attention, where the order maps seamlessly guide the user interactions (in the form of clicks) to attend to the image features. We further present an object-aware attention module to incorporate a strong object-level understanding to better differentiate objects with similar order. Our approach allows both dense and sparse integration of user clicks, enhancing both accuracy and efficiency as compared to prior works. Experimental results demonstrate that OIS achieves state-of-the-art performance, improving mIoU after one click by 7.61 on the HQSeg44K dataset and 1.32 on the DAVIS dataset as compared to the previous state-of-the-art SegNext, while also doubling inference speed compared to current leading methods. The project page is https://ukaukaaaa.github.io/projects/OIS/index.html

Order-aware Interactive Segmentation

TL;DR

Order-aware Interactive Segmentation (OIS) tackles interactive segmentation by injecting relative depth cues through order maps into order-aware attention and by enforcing explicit object-level discrimination via object-aware attention. It combines dense and sparse prompt integration to preserve spatial alignment while maintaining computational efficiency. The framework demonstrates state-of-the-art performance on HQSeg44K and DAVIS, achieving higher mIoU after minimal user input and significantly lower latency than prior methods. The work highlights the value of 3D-aware cues and explicit FG/BG separation for robust foreground segmentation in complex scenes, with strong practical impact for efficient image and video editing and annotation.

Abstract

Interactive segmentation aims to accurately segment target objects with minimal user interactions. However, current methods often fail to accurately separate target objects from the background, due to a limited understanding of order, the relative depth between objects in a scene. To address this issue, we propose OIS: order-aware interactive segmentation, where we explicitly encode the relative depth between objects into order maps. We introduce a novel order-aware attention, where the order maps seamlessly guide the user interactions (in the form of clicks) to attend to the image features. We further present an object-aware attention module to incorporate a strong object-level understanding to better differentiate objects with similar order. Our approach allows both dense and sparse integration of user clicks, enhancing both accuracy and efficiency as compared to prior works. Experimental results demonstrate that OIS achieves state-of-the-art performance, improving mIoU after one click by 7.61 on the HQSeg44K dataset and 1.32 on the DAVIS dataset as compared to the previous state-of-the-art SegNext, while also doubling inference speed compared to current leading methods. The project page is https://ukaukaaaa.github.io/projects/OIS/index.html

Paper Structure

This paper contains 40 sections, 5 equations, 17 figures, 12 tables.

Figures (17)

  • Figure 1: Comparison of our method with current state-of-the-art methods, SegNext liu2024rethinking and HQ-SAM ke2024segment, using 5 clicks (red dots represent positive clicks and green dots represent negative clicks). Our method is able to better distinguish the target (the gate) from the background (the trees and the building) and achieve a significantly higher interaction-over-union (IoU). This highlights the effectiveness of our contributions: (a) order (the relative depth between objects), (b) object awareness, and (c) the combination of dense and sparse prompt integration.
  • Figure 2: Overview of our proposed OIS framework. Our order maps are generated (blue box on the top-left) to capture the relative depth of objects in a scene, as described in Sec. \ref{['order']}. The order maps selectively guide the sparse embeddings to attend to the image features in our novel order-aware attention module (highlighted in blue inside the order-level understanding block), as described in Sec. \ref{['order']}. The object-aware attention module (highlighted in green in the object-level understanding block) imparts a strong discriminative notion of objects, as discussed in Sec. \ref{['object']}. We utilize both sparse and dense integration of prompts (highlighted in red), as described in Sec. \ref{['mixed']}.
  • Figure 3: Illustration of order map (after normalization into 0-1). Red dots in (c) indicate positive prompt clicks, and white dots in (d) and (e) represent negative prompt clicks. In order maps, darker means closer to prompt-selected object, while lighter areas are farther to the prompt-selected object.
  • Figure 4: Qualitative results on HQSeg44K. Green dots indicate the user's first clicks.
  • Figure 5: Qualitative result on DAVIS. Green dots indicate the user's first clicks.
  • ...and 12 more figures