Table of Contents
Fetching ...

Deep Hybrid Model for Region of Interest Detection in Omnidirectional Videos

Sana Alamgeer, Mylene Farias, Marcelo Carvalho

TL;DR

The paper tackles ROI detection in omnidirectional video to optimize streaming efficiency. It introduces a deep hybrid framework that fuses a bottom-up saliency pathway, leveraging two CNN+Atrous Convolution streams with frame-derived inputs and an optical-flow map, with a semantic saliency pathway based on MPYOLO's multi-projection detections. A fusion stage merges these predictions into a final ROI set, evaluated on the 360RAT dataset and outperforming several state-of-the-art 360° saliency methods. The approach demonstrates strong alignment with subjective ROI annotations and provides a public codebase to enable reproducibility and further development for view-port prediction and adaptive streaming in VR contexts.

Abstract

The main goal of the project is to design a new model that predicts regions of interest in 360$^{\circ}$ videos. The region of interest (ROI) plays an important role in 360$^{\circ}$ video streaming. For example, ROIs are used to predict view-ports, intelligently cut the videos for live streaming, etc so that less bandwidth is used. Detecting view-ports in advance helps reduce the movement of the head while streaming and watching a video via the head-mounted device. Whereas, intelligent cuts of the videos help improve the efficiency of streaming the video to users and enhance the quality of their viewing experience. This report illustrates the secondary task to identify ROIs, in which, we design, train, and test a hybrid saliency model. In this work, we refer to saliency regions to represent the regions of interest. The method includes the processes as follows: preprocessing the video to obtain frames, developing a hybrid saliency model for predicting the region of interest, and finally post-processing the output predictions of the hybrid saliency model to obtain the output region of interest for each frame. Then, we compare the performance of the proposed method with the subjective annotations of the 360RAT dataset.

Deep Hybrid Model for Region of Interest Detection in Omnidirectional Videos

TL;DR

The paper tackles ROI detection in omnidirectional video to optimize streaming efficiency. It introduces a deep hybrid framework that fuses a bottom-up saliency pathway, leveraging two CNN+Atrous Convolution streams with frame-derived inputs and an optical-flow map, with a semantic saliency pathway based on MPYOLO's multi-projection detections. A fusion stage merges these predictions into a final ROI set, evaluated on the 360RAT dataset and outperforming several state-of-the-art 360° saliency methods. The approach demonstrates strong alignment with subjective ROI annotations and provides a public codebase to enable reproducibility and further development for view-port prediction and adaptive streaming in VR contexts.

Abstract

The main goal of the project is to design a new model that predicts regions of interest in 360 videos. The region of interest (ROI) plays an important role in 360 video streaming. For example, ROIs are used to predict view-ports, intelligently cut the videos for live streaming, etc so that less bandwidth is used. Detecting view-ports in advance helps reduce the movement of the head while streaming and watching a video via the head-mounted device. Whereas, intelligent cuts of the videos help improve the efficiency of streaming the video to users and enhance the quality of their viewing experience. This report illustrates the secondary task to identify ROIs, in which, we design, train, and test a hybrid saliency model. In this work, we refer to saliency regions to represent the regions of interest. The method includes the processes as follows: preprocessing the video to obtain frames, developing a hybrid saliency model for predicting the region of interest, and finally post-processing the output predictions of the hybrid saliency model to obtain the output region of interest for each frame. Then, we compare the performance of the proposed method with the subjective annotations of the 360RAT dataset.

Paper Structure

This paper contains 8 sections, 3 equations, 8 figures, 5 tables, 2 algorithms.

Figures (8)

  • Figure 1: Flowchart of the proposed method of predicting region of interest in 360$^{\circ}$ videos.
  • Figure 2: Illustration of inputs $A$ and $B$ for a random frame of a 360$^{\circ}$ video. (a) Reference frame, (b) $I_{T}$ image of (a), and (c) Optical flow map of (a).
  • Figure 3: Block diagram of the proposed bottom-up saliency model.
  • Figure 4: Illustration of CNN Block in the proposed bottom-up saliency model.
  • Figure 5: Illustration of ACL Block in the proposed bottom-up saliency model.
  • ...and 3 more figures