Moving Object Proposals with Deep Learned Optical Flow for Video Object Segmentation
Ge Shi, Zhili Yang
TL;DR
The paper tackles video moving-object segmentation by leveraging unsupervised optical flow to guide segmentation. It proposes a two-stage pipeline: first train an unsupervised optical-flow network (UnFlow) and then feed its output into a SegNet encoder-decoder to produce moving-object proposals, trained on DAVIS 2017. Key contributions include fine-tuning UnFlow on DAVIS 2017 and adapting SegNet for two-class motion segmentation, with an implementation in TensorFlow on AWS EC2. The results demonstrate the feasibility of motion-guided segmentation, but reveal limitations due to lack of semantic information and boundary artifacts, suggesting future work to integrate temporal models and semantic cues for improved performance.
Abstract
Dynamic scene understanding is one of the most conspicuous field of interest among computer vision community. In order to enhance dynamic scene understanding, pixel-wise segmentation with neural networks is widely accepted. The latest researches on pixel-wise segmentation combined semantic and motion information and produced good performance. In this work, we propose a state of art architecture of neural networks to accurately and efficiently get the moving object proposals (MOP). We first train an unsupervised convolutional neural network (UnFlow) to generate optical flow estimation. Then we render the output of optical flow net to a fully convolutional SegNet model. The main contribution of our work is (1) Fine-tuning the pretrained optical flow model on the brand new DAVIS Dataset; (2) Leveraging fully convolutional neural networks with Encoder-Decoder architecture to segment objects. We developed the codes with TensorFlow, and executed the training and evaluation processes on an AWS EC2 instance.
