CPSL: Representing Volumetric Video via Content-Promoted Scene Layers
Kaiyuan Hu, Yili Jin, Junhua Liu, Xize Duan, Hong Kang, Xue Liu
TL;DR
CPSL addresses the challenge of delivering scalable immersive video by replacing costly 3D reconstructions with a compact, content-aware 2.5D representation that decomposes each frame into a small set of depth-ordered RGBA layers. The method leverages depth–semantic fusion, instance promotion, edge-aware layer matting, and a Dynamic Pixel Strip to maintain boundary continuity, enabling parallax-corrected rendering via depth-based warping and alpha compositing. Temporal coherence is achieved through GOP-based layer propagation and EDC-guided boundary refinement, allowing real-time playback with standard video codecs. Empirical results on monocular datasets show CPSL achieving superior perceptual quality and boundary fidelity while reducing storage and rendering costs by several folds, and even scaling to full-scene volumetric video with substantial bitrate savings compared to point-cloud streaming. This work provides a practical, streaming-friendly bridge from 2D video to scalable 2.5D immersive media, with potential for broader deployment in real-time communication and interactive applications.
Abstract
Volumetric video enables immersive and interactive visual experiences by supporting free viewpoint exploration and realistic motion parallax. However, existing volumetric representations from explicit point clouds to implicit neural fields, remain costly in capture, computation, and rendering, which limits their scalability for on-demand video and reduces their feasibility for real-time communication. To bridge this gap, we propose Content-Promoted Scene Layers (CPSL), a compact 2.5D video representation that brings the perceptual benefits of volumetric video to conventional 2D content. Guided by per-frame depth and content saliency, CPSL decomposes each frame into a small set of geometry-consistent layers equipped with soft alpha bands and an edge-depth cache that jointly preserve occlusion ordering and boundary continuity. These lightweight, 2D-encodable assets enable parallax-corrected novel-view synthesis via depth-weighted warping and front-to-back alpha compositing, bypassing expensive 3D reconstruction. Temporally, CPSL maintains inter-frame coherence using motion-guided propagation and per-layer encoding, supporting real-time playback with standard video codecs. Across multiple benchmarks, CPSL achieves superior perceptual quality and boundary fidelity compared with layer-based and neural-field baselines while reducing storage and rendering cost by several folds. Our approach offer a practical path from 2D video to scalable 2.5D immersive media.
