Submodular video object proposal selection for semantic object segmentation

Tinghuai Wang

Submodular video object proposal selection for semantic object segmentation

Tinghuai Wang

TL;DR

The paper tackles semantic video object segmentation under weak supervision by learning a data-driven, spatio-temporal representation that aggregates multiple instance proposals across frames. It introduces a submodular track selection method with a facility-location term $\\mathcal{H}(\\mathcal{D})$ and a discriminative term $\\mathcal{P}(\\mathcal{D})$ to prune noisy proposals, with the overall objective $\\mathcal{E}(\\mathcal{D})=\\mathcal{H}(\\mathcal{D})+\\mathcal{P}(\\mathcal{D})$ optimized greedily. Object segmentation is performed on a space-time superpixel graph via an energy $E(x)$ combining color and semantic unary potentials and a pairwise term, solved with alpha expansion. Experiments on YouTube-Objects show competitive improvements over state-of-the-art baselines, demonstrating that submodular track selection and cross-frame proposal aggregation can effectively exploit long-range context using pre-trained image recognizers.

Abstract

Learning a data-driven spatio-temporal semantic representation of the objects is the key to coherent and consistent labelling in video. This paper proposes to achieve semantic video object segmentation by learning a data-driven representation which captures the synergy of multiple instances from continuous frames. To prune the noisy detections, we exploit the rich information among multiple instances and select the discriminative and representative subset. This selection process is formulated as a facility location problem solved by maximising a submodular function. Our method retrieves the longer term contextual dependencies which underpins a robust semantic video object segmentation algorithm. We present extensive experiments on a challenging dataset that demonstrate the superior performance of our approach compared with the state-of-the-art methods.

Submodular video object proposal selection for semantic object segmentation

TL;DR

and a discriminative term

to prune noisy proposals, with the overall objective

optimized greedily. Object segmentation is performed on a space-time superpixel graph via an energy

combining color and semantic unary potentials and a pairwise term, solved with alpha expansion. Experiments on YouTube-Objects show competitive improvements over state-of-the-art baselines, demonstrating that submodular track selection and cross-frame proposal aggregation can effectively exploit long-range context using pre-trained image recognizers.

Abstract

Paper Structure (8 sections, 7 equations, 4 figures, 1 table)

This paper contains 8 sections, 7 equations, 4 figures, 1 table.

Introduction
Object Discovery
Proposal Scoring and Classification
Tracking for Proposal Mining
Submodular Track Selection
Object Segmentation
Experiments
Conclusion

Figures (4)

Figure 1: An illustration of the proposed object discovery strategy.
Figure 2: An illustration of the weighted spatial average pooling strategy.
Figure 3: Iterative tracking to eliminate spurious detections and extract consistent proposals.
Figure 4: Representative successful results by our approach on YouTube-Objects dataset.

Submodular video object proposal selection for semantic object segmentation

TL;DR

Abstract

Submodular video object proposal selection for semantic object segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)