Table of Contents
Fetching ...

IFSENet : Harnessing Sparse Iterations for Interactive Few-shot Segmentation Excellence

Shreyas Chandgothia, Ardhendu Sekhar, Amit Sethi

TL;DR

IFSENet is proposed, which can accept sparse supervision on a single or few support images in the form of clicks to generate masks on support as well as query images, and approaches the accuracy of previous state-of-the-art few-shot segmentation models with considerably lower annotation effort when tested on Pascal and SBD datasets on query images.

Abstract

Training a computer vision system to segment a novel class typically requires collecting and painstakingly annotating lots of images with objects from that class. Few-shot segmentation techniques reduce the required number of images to learn to segment a new class, but careful annotations of object boundaries are still required. On the other hand, interactive segmentation techniques only focus on incrementally improving the segmentation of one object at a time (typically, using clicks given by an expert) in a class-agnostic manner. We combine the two concepts to drastically reduce the effort required to train segmentation models for novel classes. Instead of trivially feeding interactive segmentation masks as ground truth to a few-shot segmentation model, we propose IFSENet, which can accept sparse supervision on a single or few support images in the form of clicks to generate masks on support (training, at least clicked upon once) as well as query (test, never clicked upon) images. To trade-off effort for accuracy flexibly, the number of images and clicks can be incrementally added to the support set to further improve the segmentation of support as well as query images. The proposed model approaches the accuracy of previous state-of-the-art few-shot segmentation models with considerably lower annotation effort (clicks instead of maps), when tested on Pascal and SBD datasets on query images. It also works well as an interactive segmentation method on support images.

IFSENet : Harnessing Sparse Iterations for Interactive Few-shot Segmentation Excellence

TL;DR

IFSENet is proposed, which can accept sparse supervision on a single or few support images in the form of clicks to generate masks on support as well as query images, and approaches the accuracy of previous state-of-the-art few-shot segmentation models with considerably lower annotation effort when tested on Pascal and SBD datasets on query images.

Abstract

Training a computer vision system to segment a novel class typically requires collecting and painstakingly annotating lots of images with objects from that class. Few-shot segmentation techniques reduce the required number of images to learn to segment a new class, but careful annotations of object boundaries are still required. On the other hand, interactive segmentation techniques only focus on incrementally improving the segmentation of one object at a time (typically, using clicks given by an expert) in a class-agnostic manner. We combine the two concepts to drastically reduce the effort required to train segmentation models for novel classes. Instead of trivially feeding interactive segmentation masks as ground truth to a few-shot segmentation model, we propose IFSENet, which can accept sparse supervision on a single or few support images in the form of clicks to generate masks on support (training, at least clicked upon once) as well as query (test, never clicked upon) images. To trade-off effort for accuracy flexibly, the number of images and clicks can be incrementally added to the support set to further improve the segmentation of support as well as query images. The proposed model approaches the accuracy of previous state-of-the-art few-shot segmentation models with considerably lower annotation effort (clicks instead of maps), when tested on Pascal and SBD datasets on query images. It also works well as an interactive segmentation method on support images.
Paper Structure (17 sections, 1 equation, 9 figures, 3 tables)

This paper contains 17 sections, 1 equation, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Architecture of IFSENet: Notion-wise yellow blocks are operations with learnable parameters, grey blocks are training-free operations, $\beta$ block is 1x1conv+RELU, resize block is spatial bilinear interpolation, argmax block operates along the channel dimension, expand block makes multiple copies of a 1x1xC vector and stacks them to make the desired spatial dimension.
  • Figure 2: Architecture of the support (see Figure \ref{['fig:IFS_architecture']}): Notation-wise 'C' is channel concatenation, $\alpha$ block is 3x3conv+RELU, $\beta$ block is 1x1conv+RELU, max pool halves the spatial dimensions, upconv block doubles the spatial dimensions but also halves the channel dimension, GAP block is spatial global average pooling operation, head block is 1x1conv with 2-channel output.
  • Figure 3: Architecture of the query path (see Figure \ref{['fig:IFS_architecture']}): Notation-wise 'C' is channel concatenation, $\alpha$ block is 3x3conv+RELU operations, $\beta$ block is 1x1conv+RELU operations, head block is 1x1conv operation with 2-channel output.
  • Figure 4: Visualization of segmentation episodes on potted plants class : Positive clicks are green dots, negative clicks are red dots in the support images, and the segmentation masks are overlaid blue regions on both support and query images.
  • Figure 5: Results on query predictions for our 1-shot model trained on Pascal-$5^i$ for validation classes (upper panel) and training classes (lower panel).
  • ...and 4 more figures