Table of Contents
Fetching ...

ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images

Weiqi Li, Shijie Zhao, Bin Chen, Xinhua Cheng, Junlin Li, Li Zhang, Jian Zhang

TL;DR

ResVR addresses the gap where previous ODI rescaling methods optimize ERP image quality but neglect the actual viewport that users see on HMDs. It jointly learns downscaling and viewport rendering by introducing a discrete pixel sampling strategy and a spherical pixel shape representation, enabling end-to-end training from LR ERP to HR viewport; the VR module uses an implicit neural representation approach to render viewports directly from LR-ERP features. The method achieves state-of-the-art viewport quality across different fields of view, resolutions, and view directions while maintaining low transmission bitrate, demonstrating the practical impact of end-to-end ODI processing. This framework has potential to significantly reduce bandwidth for VR streaming services without sacrificing user-perceived image quality.

Abstract

With the advent of virtual reality technology, omnidirectional image (ODI) rescaling techniques are increasingly embraced for reducing transmitted and stored file sizes while preserving high image quality. Despite this progress, current ODI rescaling methods predominantly focus on enhancing the quality of images in equirectangular projection (ERP) format, which overlooks the fact that the content viewed on head mounted displays (HMDs) is actually a rendered viewport instead of an ERP image. In this work, we emphasize that focusing solely on ERP quality results in inferior viewport visual experiences for users. Thus, we propose ResVR, which is the first comprehensive framework for the joint Rescaling and Viewport Rendering of ODIs. ResVR allows obtaining LR ERP images for transmission while rendering high-quality viewports for users to watch on HMDs. In our ResVR, a novel discrete pixel sampling strategy is developed to tackle the complex mapping between the viewport and ERP, enabling end-to-end training of ResVR pipeline. Furthermore, a spherical pixel shape representation technique is innovatively derived from spherical differentiation to significantly improve the visual quality of rendered viewports. Extensive experiments demonstrate that our ResVR outperforms existing methods in viewport rendering tasks across different fields of view, resolutions, and view directions while keeping a low transmission overhead.

ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images

TL;DR

ResVR addresses the gap where previous ODI rescaling methods optimize ERP image quality but neglect the actual viewport that users see on HMDs. It jointly learns downscaling and viewport rendering by introducing a discrete pixel sampling strategy and a spherical pixel shape representation, enabling end-to-end training from LR ERP to HR viewport; the VR module uses an implicit neural representation approach to render viewports directly from LR-ERP features. The method achieves state-of-the-art viewport quality across different fields of view, resolutions, and view directions while maintaining low transmission bitrate, demonstrating the practical impact of end-to-end ODI processing. This framework has potential to significantly reduce bandwidth for VR streaming services without sacrificing user-perceived image quality.

Abstract

With the advent of virtual reality technology, omnidirectional image (ODI) rescaling techniques are increasingly embraced for reducing transmitted and stored file sizes while preserving high image quality. Despite this progress, current ODI rescaling methods predominantly focus on enhancing the quality of images in equirectangular projection (ERP) format, which overlooks the fact that the content viewed on head mounted displays (HMDs) is actually a rendered viewport instead of an ERP image. In this work, we emphasize that focusing solely on ERP quality results in inferior viewport visual experiences for users. Thus, we propose ResVR, which is the first comprehensive framework for the joint Rescaling and Viewport Rendering of ODIs. ResVR allows obtaining LR ERP images for transmission while rendering high-quality viewports for users to watch on HMDs. In our ResVR, a novel discrete pixel sampling strategy is developed to tackle the complex mapping between the viewport and ERP, enabling end-to-end training of ResVR pipeline. Furthermore, a spherical pixel shape representation technique is innovatively derived from spherical differentiation to significantly improve the visual quality of rendered viewports. Extensive experiments demonstrate that our ResVR outperforms existing methods in viewport rendering tasks across different fields of view, resolutions, and view directions while keeping a low transmission overhead.
Paper Structure (17 sections, 11 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 17 sections, 11 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of our proposed ResVR framework. The comprehensive ODI processing of ResVR contains two sequential steps: (1) ODI Downscaling & Compression and (2) Decompression & Viewport Rendering. (a) In the training process, HR ERP patches $\mathbf{I}^\text{HR-ERP-Patch}$ are randomly sampled through our proposed discrete pixel sampling strategy (Algo. \ref{['algo:dissamp']}) to generate the guided LR patches $\mathbf{I}^\text{LR-ERP-Patch}$, query coordinates $\mathbf{X}_{samp}$ and the set of ground truth pixels $S_{pix}$. This strategy innovatively makes the end-to-end training of ResVR feasible in implementation. (b) During inference, our trained ResVR model can be directly applied for joined rescaling and viewpoint rendering of given HR ERP images from the cloud server to user HMDs.
  • Figure 2: Illustration of the VR module, which consists of an encoder $\mathcal{E}$, a local texture estimator $h_\psi$, and an MLP decoder $\mathcal{D}$. Given query coordinates $\mathbf{X}$, it directly predicts $\hat{\mathbf{I}}^\text{HR-View}$ from $\hat{\mathbf{I}}^\text{LR-ERP}$ without the need to produce an HR ERP image.
  • Figure 3: Training ResVR end-to-end faces challenges due to the mismatch in shapes between the ERP image patch ($\mathbf{I}^\text{HR-ERP-Patch}$) and the viewport ($\mathbf{I}^\text{HR-View}$). In Situation #1, although we obtain $\mathbf{I}^\text{HR-View}$ with a rectangular image shape, its corresponding $\mathbf{I}^\text{LR-ERP-Patch}$ does not have a rectangular image shape, preventing its use in supervising the downscaling process. Situation #2 experiences the opposite issue. Both two situations are impractical for training. In contrast, our method utilizes a novel discrete pixel sampling strategy (Algo. \ref{['algo:dissamp']}) to make end-to-end training feasible.
  • Figure 4: Illustration of our proposed spherical pixel shape representation (SSR) technique. We illustrate using a point $\mathbf{y}$ on the viewport. The inverse mapping is firstly applied for $\mathbf{y}$ and its eight nearest neighbors to get $\mathbf{x}$ and its neighbors on ERP. Then these points are transformed into sphere coordinates $\{\mathbf{p}_1, \mathbf{p}_2, \cdots, \mathbf{p}_9\}$, which are used for calculating numerical derivatives to estimate the pixel shape representation $\mathbf{s}(\mathbf{x})$, according to proposed spherical central difference method in Eqs. (\ref{['eq:sshape']}) and (\ref{['eq:sphere']}).
  • Figure 5: Comparisons of two rendered viewports from ODI-SR deng2021lau ("img_005", top) and SUN 360 xiao2012recognizing ("img_046", bottom), with $(\theta=0^\circ, \phi=90^\circ)$ and $(\theta=45^\circ, \phi=180^\circ)$, respectively. The viewports are with FoVs $(F_h, F_v)=(120^\circ, 90^\circ)$ and resolutions $(w_v, h_v)=(2048, 1536)$. "Bic" stands for Bicubic interpolation. Please zoom in for more details.
  • ...and 1 more figures