Table of Contents
Fetching ...

One-Shot Learning for Semantic Segmentation

Amirreza Shaban, Shray Bansal, Zhen Liu, Irfan Essa, Byron Boots

TL;DR

The paper tackles efficient semantic segmentation for unseen classes by learning a conditioning mechanism that generates per-image FCN parameters from a single labeled support example. A two-branch network uses these parameters to classify dense per-pixel features from a query image, enabling fast one-shot segmentation and a straightforward extension to $k$-shot via OR-aggregation without retraining. On the PASCAL-5^i benchmark, the method achieves substantial gains over baselines (notably 1-shot) and offers strong speed advantages, with pretraining further boosting generalization. The work also introduces a dedicated benchmark for $k$-shot segmentation and demonstrates the practical feasibility of meta-learning for dense prediction tasks.

Abstract

Low-shot learning methods for image classification support learning from sparse data. We extend these techniques to support dense semantic image segmentation. Specifically, we train a network that, given a small set of annotated images, produces parameters for a Fully Convolutional Network (FCN). We use this FCN to perform dense pixel-level prediction on a test image for the new semantic class. Our architecture shows a 25% relative meanIoU improvement compared to the best baseline methods for one-shot segmentation on unseen classes in the PASCAL VOC 2012 dataset and is at least 3 times faster.

One-Shot Learning for Semantic Segmentation

TL;DR

The paper tackles efficient semantic segmentation for unseen classes by learning a conditioning mechanism that generates per-image FCN parameters from a single labeled support example. A two-branch network uses these parameters to classify dense per-pixel features from a query image, enabling fast one-shot segmentation and a straightforward extension to -shot via OR-aggregation without retraining. On the PASCAL-5^i benchmark, the method achieves substantial gains over baselines (notably 1-shot) and offers strong speed advantages, with pretraining further boosting generalization. The work also introduces a dedicated benchmark for -shot segmentation and demonstrates the practical feasibility of meta-learning for dense prediction tasks.

Abstract

Low-shot learning methods for image classification support learning from sparse data. We extend these techniques to support dense semantic image segmentation. Specifically, we train a network that, given a small set of annotated images, produces parameters for a Fully Convolutional Network (FCN). We use this FCN to perform dense pixel-level prediction on a test image for the new semantic class. Our architecture shows a 25% relative meanIoU improvement compared to the best baseline methods for one-shot segmentation on unseen classes in the PASCAL VOC 2012 dataset and is at least 3 times faster.

Paper Structure

This paper contains 16 sections, 6 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Overview. S is an annotated image from a new semantic class. In our approach, we input S to a function $g$ that outputs a set of parameters $\theta$. We use $\theta$ to parameterize part of a learned segmentation model which produces a segmentation mask given $I_q$.
  • Figure 2: Model Architecture. The conditioning branch receives an image-label pair and produces a set of parameters $\{w,b\}$ for the logistic regression layer $c(\cdot, w, b)$. The segmentation branch is an FCN that receives a query image as input and outputs strided features of conv-fc7. The predicted mask is generated by classifying the pixel-level features through $c(\cdot, w, b)$, which is then upsampled to the original size.
  • Figure 3: Pretraining Effect on AlexNet.
  • Figure 4: Inference Time (in s).
  • Figure 5: Some qualitative results of our method for $1$-shot. Inside each tile, we have the support set at the top and the query image at the bottom. The support is overlaid with the ground truth in yellow and the query is overlaid with our predicted mask in red.
  • ...and 5 more figures