U-Motion: Learned Point Cloud Video Compression with U-Structured Temporal Context Generation
Tingyu Fan, Yueyu Hu, Ran Gong, Yao Wang
TL;DR
U-Motion tackles the challenge of compressing dynamic point cloud videos by introducing an explicit, multi-scale motion estimation and compensation framework called U-Inter, organized in a U-Net–style hierarchy. It couples top-down motion propagation, bottom-up motion predictive coding, and multi-scale group motion with a spatial-temporal predictive coding stage to capture both inter-frame and intra-layer redundancies for both geometry and attributes. The approach supports variable bitrate with a global-local entropy strategy and is validated under MPEG CTC, demonstrating notable rate-distortion gains over MPEG G-PCC-GesTM v3.0 and the learning-based Unicorn, across both geometry and attribute streams. The work advances PCV codecs toward practical, scalable implementations with improved temporal prediction, while acknowledging computational trade-offs from K-NN-based motion processing and outlining directions for reduced complexity and multi-frame training.
Abstract
Point cloud video (PCV) is a versatile 3D representation of dynamic scenes with emerging applications. This paper introduces U-Motion, a learning-based compression scheme for both PCV geometry and attributes. We propose a U-Structured inter-frame prediction framework, U-Inter, which performs explicit motion estimation and compensation (ME/MC) at different scales with varying levels of detail. It integrates Top-Down (Fine-to-Coarse) Motion Propagation, Bottom-Up Motion Predictive Coding and Multi-scale Group Motion Compensation to enable accurate motion estimation and efficient motion compression at each scale. In addition, we design a multi-scale spatial-temporal predictive coding module to capture the cross-scale spatial redundancy remaining after U-Inter prediction. We conduct experiments following the MPEG Common Test Condition for dense dynamic point clouds and demonstrate that U-Motion can achieve significant gains over MPEG G-PCC-GesTM v3.0 and recently published learning-based methods for both geometry and attribute compression.
