Table of Contents
Fetching ...

Loss-resilient Coding of Texture and Depth for Free-viewpoint Video Conferencing

Bruno Macchiavello, Camilo Dorea, Edson M. Hung, Gene Cheung, Wai-tian Tan

TL;DR

This work tackles loss-resilient free-viewpoint video conferencing by jointly leveraging redundancy across two texture+depth views and optimizing both decoder blending and encoder reference-picture selection. The key idea is to perform adaptive blending at the decoder during DIBR view synthesis to weight the more reliable view, while the encoder proactively protects voxels visible in both views through per-block RPS guided by an end-to-end distortion model. A quadratic distortion model captures synthesized-view sensitivity to disparity errors, and a Lagrangian optimization enables efficient, decoupled block-level decisions under bitrate constraints. Experimental results show meaningful PSNR gains over reactive feedback and substantial visual improvements, especially at moderate to high loss rates, underscoring the practical impact for robust, real-time free-viewpoint conferencing.

Abstract

Free-viewpoint video conferencing allows a participant to observe the remote 3D scene from any freely chosen viewpoint. An intermediate virtual viewpoint image is commonly synthesized using two pairs of transmitted texture and depth maps from two neighboring captured viewpoints via depth-image-based rendering (DIBR). To maintain high quality of synthesized images, it is imperative to contain the adverse effects of network packet losses that may arise during texture and depth video transmission. Towards this end, we develop an integrated approach that exploits the representation redundancy inherent in the multiple streamed videos a voxel in the 3D scene visible to two captured views is sampled and coded twice in the two views. In particular, at the receiver we first develop an error concealment strategy that adaptively blends corresponding pixels in the two captured views during DIBR, so that pixels from the more reliable transmitted view are weighted more heavily. We then couple it with a sender-side optimization of reference picture selection (RPS) during real-time video coding, so that blocks containing samples of voxels that are visible in both views are more error-resiliently coded in one view only, given adaptive blending will erase errors in the other view. Further, synthesized view distortion sensitivities to texture versus depth errors are analyzed, so that relative importance of texture and depth code blocks can be computed for system-wide RPS optimization. Experimental results show that the proposed scheme can outperform the use of a traditional feedback channel by up to 0.82 dB on average at 8% packet loss rate, and by as much as 3 dB for particular frames.

Loss-resilient Coding of Texture and Depth for Free-viewpoint Video Conferencing

TL;DR

This work tackles loss-resilient free-viewpoint video conferencing by jointly leveraging redundancy across two texture+depth views and optimizing both decoder blending and encoder reference-picture selection. The key idea is to perform adaptive blending at the decoder during DIBR view synthesis to weight the more reliable view, while the encoder proactively protects voxels visible in both views through per-block RPS guided by an end-to-end distortion model. A quadratic distortion model captures synthesized-view sensitivity to disparity errors, and a Lagrangian optimization enables efficient, decoupled block-level decisions under bitrate constraints. Experimental results show meaningful PSNR gains over reactive feedback and substantial visual improvements, especially at moderate to high loss rates, underscoring the practical impact for robust, real-time free-viewpoint conferencing.

Abstract

Free-viewpoint video conferencing allows a participant to observe the remote 3D scene from any freely chosen viewpoint. An intermediate virtual viewpoint image is commonly synthesized using two pairs of transmitted texture and depth maps from two neighboring captured viewpoints via depth-image-based rendering (DIBR). To maintain high quality of synthesized images, it is imperative to contain the adverse effects of network packet losses that may arise during texture and depth video transmission. Towards this end, we develop an integrated approach that exploits the representation redundancy inherent in the multiple streamed videos a voxel in the 3D scene visible to two captured views is sampled and coded twice in the two views. In particular, at the receiver we first develop an error concealment strategy that adaptively blends corresponding pixels in the two captured views during DIBR, so that pixels from the more reliable transmitted view are weighted more heavily. We then couple it with a sender-side optimization of reference picture selection (RPS) during real-time video coding, so that blocks containing samples of voxels that are visible in both views are more error-resiliently coded in one view only, given adaptive blending will erase errors in the other view. Further, synthesized view distortion sensitivities to texture versus depth errors are analyzed, so that relative importance of texture and depth code blocks can be computed for system-wide RPS optimization. Experimental results show that the proposed scheme can outperform the use of a traditional feedback channel by up to 0.82 dB on average at 8% packet loss rate, and by as much as 3 dB for particular frames.

Paper Structure

This paper contains 25 sections, 17 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: A bandwidth-efficient free-viewpoint video streaming system dynamically selects two views for transmission.
  • Figure 2: Motion prediction in differentially coded video causes error propagations from predictor block to target block.
  • Figure 3: Synthesized distortion and quadratic model functions for one pixel are shown in (a) for image Akko and Kayo view 47 in (b). The resulting curvatures for for entire image are shown in (c).
  • Figure 4: PSNR per frame for ARPS and RFC at $5\%$ packet loss rate for (a) Pantomime, (b) Kendo and (c) Akko and Kayo.
  • Figure 5: Synthesized views for Pantomime, frame $118$, at $5\%$ packet loss. (a) RFC, (b) RPS1 and (c) ARPS schemes.
  • ...and 2 more figures