Table of Contents
Fetching ...

U-Motion: Learned Point Cloud Video Compression with U-Structured Temporal Context Generation

Tingyu Fan, Yueyu Hu, Ran Gong, Yao Wang

TL;DR

U-Motion tackles the challenge of compressing dynamic point cloud videos by introducing an explicit, multi-scale motion estimation and compensation framework called U-Inter, organized in a U-Net–style hierarchy. It couples top-down motion propagation, bottom-up motion predictive coding, and multi-scale group motion with a spatial-temporal predictive coding stage to capture both inter-frame and intra-layer redundancies for both geometry and attributes. The approach supports variable bitrate with a global-local entropy strategy and is validated under MPEG CTC, demonstrating notable rate-distortion gains over MPEG G-PCC-GesTM v3.0 and the learning-based Unicorn, across both geometry and attribute streams. The work advances PCV codecs toward practical, scalable implementations with improved temporal prediction, while acknowledging computational trade-offs from K-NN-based motion processing and outlining directions for reduced complexity and multi-frame training.

Abstract

Point cloud video (PCV) is a versatile 3D representation of dynamic scenes with emerging applications. This paper introduces U-Motion, a learning-based compression scheme for both PCV geometry and attributes. We propose a U-Structured inter-frame prediction framework, U-Inter, which performs explicit motion estimation and compensation (ME/MC) at different scales with varying levels of detail. It integrates Top-Down (Fine-to-Coarse) Motion Propagation, Bottom-Up Motion Predictive Coding and Multi-scale Group Motion Compensation to enable accurate motion estimation and efficient motion compression at each scale. In addition, we design a multi-scale spatial-temporal predictive coding module to capture the cross-scale spatial redundancy remaining after U-Inter prediction. We conduct experiments following the MPEG Common Test Condition for dense dynamic point clouds and demonstrate that U-Motion can achieve significant gains over MPEG G-PCC-GesTM v3.0 and recently published learning-based methods for both geometry and attribute compression.

U-Motion: Learned Point Cloud Video Compression with U-Structured Temporal Context Generation

TL;DR

U-Motion tackles the challenge of compressing dynamic point cloud videos by introducing an explicit, multi-scale motion estimation and compensation framework called U-Inter, organized in a U-Net–style hierarchy. It couples top-down motion propagation, bottom-up motion predictive coding, and multi-scale group motion with a spatial-temporal predictive coding stage to capture both inter-frame and intra-layer redundancies for both geometry and attributes. The approach supports variable bitrate with a global-local entropy strategy and is validated under MPEG CTC, demonstrating notable rate-distortion gains over MPEG G-PCC-GesTM v3.0 and the learning-based Unicorn, across both geometry and attribute streams. The work advances PCV codecs toward practical, scalable implementations with improved temporal prediction, while acknowledging computational trade-offs from K-NN-based motion processing and outlining directions for reduced complexity and multi-frame training.

Abstract

Point cloud video (PCV) is a versatile 3D representation of dynamic scenes with emerging applications. This paper introduces U-Motion, a learning-based compression scheme for both PCV geometry and attributes. We propose a U-Structured inter-frame prediction framework, U-Inter, which performs explicit motion estimation and compensation (ME/MC) at different scales with varying levels of detail. It integrates Top-Down (Fine-to-Coarse) Motion Propagation, Bottom-Up Motion Predictive Coding and Multi-scale Group Motion Compensation to enable accurate motion estimation and efficient motion compression at each scale. In addition, we design a multi-scale spatial-temporal predictive coding module to capture the cross-scale spatial redundancy remaining after U-Inter prediction. We conduct experiments following the MPEG Common Test Condition for dense dynamic point clouds and demonstrate that U-Motion can achieve significant gains over MPEG G-PCC-GesTM v3.0 and recently published learning-based methods for both geometry and attribute compression.

Paper Structure

This paper contains 51 sections, 11 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: The overall architecture of U-Motion for attribute.
  • Figure 2: The network architecture for U-Inter module.
  • Figure 3: The network architecture for Spatial-Temporal Predictive Coding. For attribute compression, (b) is used for reconstruction; for geometry compression, (a) is used for reconstruction.
  • Figure 4: Y-PSNR Performance comparison on attribute (color) compression among our method, Unicorn and G-PCC-GesTM.
  • Figure 5: D1-PSNR performance comparison on lossy geometry compression among our method, Unicorn and D-DPCC. The inconsistency of Unicorn and D-DPCC's RD-curve compared with that in their original paper is due to different quantization methods used when downsampling 11-bit point clouds into 10-bit. Unicorn and D-DPCC wang2024versatile1wang2024versatile2ijcai2022p126 used $floor(\cdot)$ for quantization. Instead, We follow MPEG's standard that uses $round(\cdot)$ for quantization. We have confirmed this with the authors of Unicorn and D-DPCC.
  • ...and 7 more figures