Table of Contents
Fetching ...

360VFI: A Dataset and Benchmark for Omnidirectional Video Frame Interpolation

Wenxuan Lu, Mengshun Hu, Yansheng Qiu, Liang Liao, Zheng Wang

TL;DR

This work tackles the challenge of interpolation for omnidirectional 360° videos by introducing 360VFI, the first dataset and benchmark for Omnidirectional Video Frame Interpolation (Omni-VFI), and a distortion-aware network that leverages ERP distortion priors. It fuses an ERP-aware feature extractor (DistortionGuard) with a distortion-conditioned frame generator (OmniFTB) to reconstruct intermediate frames, using a distortion map based on latitude to modulate processing. The four motion settings in the dataset enable robust evaluation of interpolation under latitude-dependent distortion, and experiments show state-of-the-art performance, especially for large vertical motions. The contributions offer a practical path toward higher-frame-rate immersive 360° video with improved temporal coherence and reduced distortion artifacts.

Abstract

Head-mounted 360° displays and portable 360° cameras have significantly progressed, providing viewers a realistic and immersive experience. However, many omnidirectional videos have low frame rates that can lead to visual fatigue, and the prevailing plane frame interpolation methodologies are unsuitable for omnidirectional video interpolation because they are designed solely for traditional videos. This paper introduces the benchmark dataset, 360VFI, for Omnidirectional Video Frame Interpolation. We present a practical implementation that introduces a distortion prior from omnidirectional video into the network to modulate distortions. Specifically, we propose a pyramid distortion-sensitive feature extractor that uses the unique characteristics of equirectangular projection (ERP) format as prior information. Moreover, we devise a decoder that uses an affine transformation to further facilitate the synthesis of intermediate frames. 360VFI is the first dataset and benchmark that explores the challenge of Omnidirectional Video Frame Interpolation. Through our benchmark analysis, we present four different distortion condition scenes in the proposed 360VFI dataset to evaluate the challenges triggered by distortion during interpolation. Besides, experimental results demonstrate that Omnidirectional Video Interpolation can be effectively improved by modeling for omnidirectional distortion.

360VFI: A Dataset and Benchmark for Omnidirectional Video Frame Interpolation

TL;DR

This work tackles the challenge of interpolation for omnidirectional 360° videos by introducing 360VFI, the first dataset and benchmark for Omnidirectional Video Frame Interpolation (Omni-VFI), and a distortion-aware network that leverages ERP distortion priors. It fuses an ERP-aware feature extractor (DistortionGuard) with a distortion-conditioned frame generator (OmniFTB) to reconstruct intermediate frames, using a distortion map based on latitude to modulate processing. The four motion settings in the dataset enable robust evaluation of interpolation under latitude-dependent distortion, and experiments show state-of-the-art performance, especially for large vertical motions. The contributions offer a practical path toward higher-frame-rate immersive 360° video with improved temporal coherence and reduced distortion artifacts.

Abstract

Head-mounted 360° displays and portable 360° cameras have significantly progressed, providing viewers a realistic and immersive experience. However, many omnidirectional videos have low frame rates that can lead to visual fatigue, and the prevailing plane frame interpolation methodologies are unsuitable for omnidirectional video interpolation because they are designed solely for traditional videos. This paper introduces the benchmark dataset, 360VFI, for Omnidirectional Video Frame Interpolation. We present a practical implementation that introduces a distortion prior from omnidirectional video into the network to modulate distortions. Specifically, we propose a pyramid distortion-sensitive feature extractor that uses the unique characteristics of equirectangular projection (ERP) format as prior information. Moreover, we devise a decoder that uses an affine transformation to further facilitate the synthesis of intermediate frames. 360VFI is the first dataset and benchmark that explores the challenge of Omnidirectional Video Frame Interpolation. Through our benchmark analysis, we present four different distortion condition scenes in the proposed 360VFI dataset to evaluate the challenges triggered by distortion during interpolation. Besides, experimental results demonstrate that Omnidirectional Video Interpolation can be effectively improved by modeling for omnidirectional distortion.
Paper Structure (22 sections, 10 equations, 8 figures, 4 tables)

This paper contains 22 sections, 10 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Left: Traditional Video Frame Interpolation. The inputs are two adjacent plane frames, and the output is a target plane frame. Right: Omnidirectional Video Frame Interpolation. The inputs are two adjacent omnidirectional frames with a full field-of-view from an omnidirectional video, and the output is a target omnidirectional frame. Original omnidirectional frames are spherical, and the most common format is the equirectangular projection type (ERP). The two kinds of omnidirectional video formats can be projected from each other, and our proposed method tackles ERP videos.
  • Figure 2: Examples of Different Scenarios in 360VFI Dataset
  • Figure 3: Input Frames and Optical Flow of Different Settings in 360VFI Dataset. The colored parts in the optical flow image are larger and deeper when the motion is larger. The motion increases from the easy setting to the extreme setting.
  • Figure 4: Architectural Overview. Our model is an efficient encoder-decoder based network, which first extracts less distorted pyramid context features $\phi_0^l$ and $\phi_1^l$ from input omnidirectional frames $I_1$, $I_2$ with DistortionGuards, and then gradually refines bilateral intermediate flow fields $F_{t\rightarrow0}^l$ through OmniFTB generator, until yielding the target frame $I_p$. The figure above gives an illustration of the second-level DistortionGuard and OmniFTB, and details are shown below in Figure \ref{['fig:encoder']} and Figure \ref{['fig:decoder']}.
  • Figure 5: Illustration of an ERP frame and the distortion condition. The extent of distortion is the most severe in the polar regions.
  • ...and 3 more figures