Non-parametric Contextual Relationship Learning for Semantic Video Object Segmentation
Tinghuai Wang, Huiling Wang
TL;DR
The paper tackles semantic video object segmentation under challenging motion and appearance variations by learning and propagating higher-level semantic contexts without requiring training data. It introduces an exemplar-based nonparametric approach that builds a similarity graph over regions derived from trajectory hypotheses, uses a link-prediction viewpoint to propagate context, and integrates the results into a fully connected CRF for accurate per-region labeling. Key contributions include a two-pass context propagation strategy to estimate cross-object relationships and a CRF-based inference framework that leverages learned context scores. On the YouTube-Objects benchmark, the method achieves state-of-the-art performance, demonstrating the value of global contextual reasoning for robust video segmentation.
Abstract
We propose a novel approach for modeling semantic contextual relationships in videos. This graph-based model enables the learning and propagation of higher-level spatial-temporal contexts to facilitate the semantic labeling of local regions. We introduce an exemplar-based nonparametric view of contextual cues, where the inherent relationships implied by object hypotheses are encoded on a similarity graph of regions. Contextual relationships learning and propagation are performed to estimate the pairwise contexts between all pairs of unlabeled local regions. Our algorithm integrates the learned contexts into a Conditional Random Field (CRF) in the form of pairwise potentials and infers the per-region semantic labels. We evaluate our approach on the challenging YouTube-Objects dataset which shows that the proposed contextual relationship model outperforms the state-of-the-art methods.
