Dual Prototype Attention for Unsupervised Video Object Segmentation

Suhwan Cho; Minhyeok Lee; Seunghoon Lee; Dogyoon Lee; Heeseung Choi; Ig-Jae Kim; Sangyoun Lee

Dual Prototype Attention for Unsupervised Video Object Segmentation

Suhwan Cho, Minhyeok Lee, Seunghoon Lee, Dogyoon Lee, Heeseung Choi, Ig-Jae Kim, Sangyoun Lee

TL;DR

This paper proposes two novel prototype-based attention mechanisms, inter-modality attention (IMA) and inter-frame attention (IFA), to incorporate these techniques via dense propagation across different modalities and frames.

Abstract

Unsupervised video object segmentation (VOS) aims to detect and segment the most salient object in videos. The primary techniques used in unsupervised VOS are 1) the collaboration of appearance and motion information; and 2) temporal fusion between different frames. This paper proposes two novel prototype-based attention mechanisms, inter-modality attention (IMA) and inter-frame attention (IFA), to incorporate these techniques via dense propagation across different modalities and frames. IMA densely integrates context information from different modalities based on a mutual refinement. IFA injects global context of a video to the query frame, enabling a full utilization of useful properties from multiple frames. Experimental results on public benchmark datasets demonstrate that our proposed approach outperforms all existing methods by a substantial margin. The proposed two components are also thoroughly validated via ablative study.

Dual Prototype Attention for Unsupervised Video Object Segmentation

TL;DR

Abstract

Paper Structure (15 sections, 14 equations, 6 figures, 4 tables)

This paper contains 15 sections, 14 equations, 6 figures, 4 tables.

Introduction
Related Work
Approach
Problem Formulation
Network Architecture
Inter-Modality Attention (IMA)
Inter-Frame Attention (IFA)
Implementation Details
Experiments
Datasets
Evaluation Metrics
Analysis
Quantitative Results
Qualitative Results
Conclusion

Figures (6)

Figure 1: Visualized feature maps after applying IMA and IFA.
Figure 2: Architecture of our proposed network. Based on a two-stream encoder-decoder architecture, IMA and IFA modules are employed. For simplicity, skip connections between encoding blocks and decoding blocks are omitted in the illustration.
Figure 3: Visualized pipeline of IMA.
Figure 4: Visualized pipeline of IFA.
Figure 5: Visualized activation maps of different IMA versions.
...and 1 more figures

Dual Prototype Attention for Unsupervised Video Object Segmentation

TL;DR

Abstract

Dual Prototype Attention for Unsupervised Video Object Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)