Table of Contents
Fetching ...

The 2018 DAVIS Challenge on Video Object Segmentation

Sergi Caelles, Alberto Montes, Kevis-Kokitsi Maninis, Yuhua Chen, Luc Van Gool, Federico Perazzi, Jordi Pont-Tuset

TL;DR

The paper presents the 2018 DAVIS Challenge on Video Object Segmentation, extending the DAVIS 2017 dataset and adding an interactive teaser track to study scribble-based, time-aware refinement. The primary track remains semi-supervised segmentation, with the same data splits and public training/validation annotations, plus an online test-dev server. Two interactive baselines demonstrate that scribble supervision can achieve substantial performance gains with far less labeling and waiting time than full supervision, using Scribble-OSVOS and a Deeplab-ResNet-101 embedding with a per-object linear classifier. The work introduces a scalable web-service evaluation framework for interactive VOS and advocates time-accuracy trade-offs to push toward usable, realistic video segmentation systems.

Abstract

We present the 2018 DAVIS Challenge on Video Object Segmentation, a public competition specifically designed for the task of video object segmentation. It builds upon the DAVIS 2017 dataset, which was presented in the previous edition of the DAVIS Challenge, and added 100 videos with multiple objects per sequence to the original DAVIS 2016 dataset. Motivated by the analysis of the results of the 2017 edition, the main track of the competition will be the same than in the previous edition (segmentation given the full mask of the objects in the first frame -- semi-supervised scenario). This edition, however, also adds an interactive segmentation teaser track, where the participants will interact with a web service simulating the input of a human that provides scribbles to iteratively improve the result.

The 2018 DAVIS Challenge on Video Object Segmentation

TL;DR

The paper presents the 2018 DAVIS Challenge on Video Object Segmentation, extending the DAVIS 2017 dataset and adding an interactive teaser track to study scribble-based, time-aware refinement. The primary track remains semi-supervised segmentation, with the same data splits and public training/validation annotations, plus an online test-dev server. Two interactive baselines demonstrate that scribble supervision can achieve substantial performance gains with far less labeling and waiting time than full supervision, using Scribble-OSVOS and a Deeplab-ResNet-101 embedding with a per-object linear classifier. The work introduces a scalable web-service evaluation framework for interactive VOS and advocates time-accuracy trade-offs to push toward usable, realistic video segmentation systems.

Abstract

We present the 2018 DAVIS Challenge on Video Object Segmentation, a public competition specifically designed for the task of video object segmentation. It builds upon the DAVIS 2017 dataset, which was presented in the previous edition of the DAVIS Challenge, and added 100 videos with multiple objects per sequence to the original DAVIS 2016 dataset. Motivated by the analysis of the results of the 2017 edition, the main track of the competition will be the same than in the previous edition (segmentation given the full mask of the objects in the first frame -- semi-supervised scenario). This edition, however, also adds an interactive segmentation teaser track, where the participants will interact with a web service simulating the input of a human that provides scribbles to iteratively improve the result.

Paper Structure

This paper contains 4 sections, 3 figures.

Figures (3)

  • Figure 1: Different levels of interaction in video object segmentation: Top left unsupervised, top right semi-supervised; bottom interactive segmentation with different levels of detail.
  • Figure 2: Simulated scribbles: First the frames are evaluated obtaining true positives (green), false negatives (red) and false positives (blue). Then, for each class, the skeleton tree is computed from error regions (yellow). The final scribbles are obtained using the largest path in the skeleton tree.
  • Figure 3: Quality vs. Timing: Evolution of $\mathcal{J} \& \mathcal{F}$ in DAVIS 2017 validation set as a function of the available time.