HAISTA-NET: Human Assisted Instance Segmentation Through Attention

Muhammed Korkmaz; T. Metin Sezgin

HAISTA-NET: Human Assisted Instance Segmentation Through Attention

Muhammed Korkmaz, T. Metin Sezgin

TL;DR

This work proposes a human-assisted segmentation model, HAISTA-NET, which augments the existing Strong Mask R-CNN network to incorporate human-specified partial boundaries and presents a dataset of hand-drawn partial object boundaries, which are referred to as human attention maps.

Abstract

Instance segmentation is a form of image detection which has a range of applications, such as object refinement, medical image analysis, and image/video editing, all of which demand a high degree of accuracy. However, this precision is often beyond the reach of what even state-of-the-art, fully automated instance segmentation algorithms can deliver. The performance gap becomes particularly prohibitive for small and complex objects. Practitioners typically resort to fully manual annotation, which can be a laborious process. In order to overcome this problem, we propose a novel approach to enable more precise predictions and generate higher-quality segmentation masks for high-curvature, complex and small-scale objects. Our human-assisted segmentation model, HAISTA-NET, augments the existing Strong Mask R-CNN network to incorporate human-specified partial boundaries. We also present a dataset of hand-drawn partial object boundaries, which we refer to as human attention maps. In addition, the Partial Sketch Object Boundaries (PSOB) dataset contains hand-drawn partial object boundaries which represent curvatures of an object's ground truth mask with several pixels. Through extensive evaluation using the PSOB dataset, we show that HAISTA-NET outperforms state-of-the art methods such as Mask R-CNN, Strong Mask R-CNN, and Mask2Former, achieving respective increases of +36.7, +29.6, and +26.5 points in AP-Mask metrics for these three models. We hope that our novel approach will set a baseline for future human-aided deep learning models by combining fully automated and interactive instance segmentation architectures.

HAISTA-NET: Human Assisted Instance Segmentation Through Attention

TL;DR

Abstract

Paper Structure (17 sections, 2 equations, 5 figures, 5 tables)

This paper contains 17 sections, 2 equations, 5 figures, 5 tables.

Introduction
Related Work
Proposed Approach
Partial Sketch Object Boundaries Dataset
Adaptive Object Curvature Detector
Representation of Human Attention Map
Network Architecture
Data Augmentation
Training Parameters
Inference
Experiment
Main Results
Multiple Factor Analysis
Curvature-Based Average Precision
PSOB Interaction Time Analysis
...and 2 more sections

Figures (5)

Figure 1: First glance of precise mask prediction by HAISTA-NET using human attention maps dataset (center image). HAISTA-NET outperforms the mask prediction of Strong Mask R-CNN (right) on high-curvature objects.
Figure 2: Method Outline. Users draw a couple of pixels according to their attention to the object. Then, a Human Attention Map is generated to concatenate with the RGB image to feed the model. We denote the concatenate operator as $\otimes$.
Figure 3: Interactive interface. Users can interact with the target object via this tool.
Figure 4: Visualization of the Predictions of HAISTA-NET. We present images with different scales, curvature numbers, and hand-drawing-based assistance types.
Figure 5: The graphs demonstrate the main and interaction effects of AP$\textsubscript{Mask}$ and AP$\textsubscript{Bbox}$ values generated using HAISTA-NET.

HAISTA-NET: Human Assisted Instance Segmentation Through Attention

TL;DR

Abstract

HAISTA-NET: Human Assisted Instance Segmentation Through Attention

Authors

TL;DR

Abstract

Table of Contents

Figures (5)