Deep Hybrid Model for Region of Interest Detection in Omnidirectional Videos
Sana Alamgeer, Mylene Farias, Marcelo Carvalho
TL;DR
The paper tackles ROI detection in omnidirectional video to optimize streaming efficiency. It introduces a deep hybrid framework that fuses a bottom-up saliency pathway, leveraging two CNN+Atrous Convolution streams with frame-derived inputs and an optical-flow map, with a semantic saliency pathway based on MPYOLO's multi-projection detections. A fusion stage merges these predictions into a final ROI set, evaluated on the 360RAT dataset and outperforming several state-of-the-art 360° saliency methods. The approach demonstrates strong alignment with subjective ROI annotations and provides a public codebase to enable reproducibility and further development for view-port prediction and adaptive streaming in VR contexts.
Abstract
The main goal of the project is to design a new model that predicts regions of interest in 360$^{\circ}$ videos. The region of interest (ROI) plays an important role in 360$^{\circ}$ video streaming. For example, ROIs are used to predict view-ports, intelligently cut the videos for live streaming, etc so that less bandwidth is used. Detecting view-ports in advance helps reduce the movement of the head while streaming and watching a video via the head-mounted device. Whereas, intelligent cuts of the videos help improve the efficiency of streaming the video to users and enhance the quality of their viewing experience. This report illustrates the secondary task to identify ROIs, in which, we design, train, and test a hybrid saliency model. In this work, we refer to saliency regions to represent the regions of interest. The method includes the processes as follows: preprocessing the video to obtain frames, developing a hybrid saliency model for predicting the region of interest, and finally post-processing the output predictions of the hybrid saliency model to obtain the output region of interest for each frame. Then, we compare the performance of the proposed method with the subjective annotations of the 360RAT dataset.
