Progressive Frame Patching for FoV-based Point Cloud Video Streaming
Tongyu Zong, Yixiang Mao, Chen Li, Yong Liu, Yao Wang
TL;DR
This work tackles the high bandwidth and latency challenges of volumetric point cloud video streaming for 6-DoF viewing. It introduces a sliding-window, multi-round progressive patching framework that exploits octree-based spatial scalability and tile-based FoV adaptation, paired with a view-distance aware tile utility model and an analytical KKT-based water-filling rate allocation. The approach demonstrates robustness to FoV and bandwidth prediction errors and outperforms non-progressive and heuristic baselines on real PCV datasets and traces. The resulting method offers smoother, higher-quality view rendering in dynamic network and viewing conditions, with practical implications for scalable FoV-adaptive PCV streaming.
Abstract
Many XR applications require the delivery of volumetric video to users with six degrees of freedom (6-DoF) movements. Point Cloud has become a popular volumetric video format. A dense point cloud consumes much higher bandwidth than a 2D/360 degree video frame. User Field of View (FoV) is more dynamic with 6-DoF movement than 3-DoF movement. To save bandwidth, FoV-adaptive streaming predicts a user's FoV and only downloads point cloud data falling in the predicted FoV. However, it is vulnerable to FoV prediction errors, which can be significant when a long buffer is utilized for smoothed streaming. In this work, we propose a multi-round progressive refinement framework for point cloud video streaming. Instead of sequentially downloading point cloud frames, our solution simultaneously downloads/patches multiple frames falling into a sliding time-window, leveraging the inherent scalability of octree-based point-cloud coding. The optimal rate allocation among all tiles of active frames are solved analytically using the heterogeneous tile rate-quality functions calibrated by the predicted user FoV. Multi-frame downloading/patching simultaneously takes advantage of the streaming smoothness resulting from long buffer and the FoV prediction accuracy at short buffer length. We evaluate our streaming solution using simulations driven by real point cloud videos, real bandwidth traces, and 6-DoF FoV traces of real users. Our solution is robust against the bandwidth/FoV prediction errors, and can deliver high and smooth view quality in the face of bandwidth variations and dynamic user and point cloud movements.
