TSC-PCAC: Voxel Transformer and Sparse Convolution Based Point Cloud Attribute Compression for 3D Broadcasting
Zixi Guo, Yun Zhang, Linwei Zhu, Hanli Wang, Gangyi Jiang
TL;DR
The paper tackles the heavy bitrate burden of point cloud attributes in 3D broadcasting by proposing TSC-PCAC, an end-to-end framework that fuses a voxel-transformer with sparse convolution-based autoencoding and a TSCM-based channel context model. It introduces a two-stage TSCM to jointly capture local and global interpoint dependencies and a channel-wise context mechanism to exploit interchannel correlations for improved entropy modeling. Empirical results show substantial bitrate reductions (up to 38.53% BD-BR versus Sparse-PCAC and notable PSNR gains) with favorable encoding/decoding efficiency, validating the effectiveness of local/global feature fusion and channel-wise context in PCAC. The approach advances learned PCAC by leveraging Transformer and sparse convolution synergy, and code is publicly available for reproducible evaluation.
Abstract
Point cloud has been the mainstream representation for advanced 3D applications, such as virtual reality and augmented reality. However, the massive data amounts of point clouds is one of the most challenging issues for transmission and storage. In this paper, we propose an end-to-end voxel Transformer and Sparse Convolution based Point Cloud Attribute Compression (TSC-PCAC) for 3D broadcasting. Firstly, we present a framework of the TSC-PCAC, which include Transformer and Sparse Convolutional Module (TSCM) based variational autoencoder and channel context module. Secondly, we propose a two-stage TSCM, where the first stage focuses on modeling local dependencies and feature representations of the point clouds, and the second stage captures global features through spatial and channel pooling encompassing larger receptive fields. This module effectively extracts global and local interpoint relevance to reduce informational redundancy. Thirdly, we design a TSCM based channel context module to exploit interchannel correlations, which improves the predicted probability distribution of quantized latent representations and thus reduces the bitrate. Experimental results indicate that the proposed TSC-PCAC method achieves an average of 38.53%, 21.30%, and 11.19% Bjontegaard Delta bitrate reductions compared to the Sparse-PCAC, NF-PCAC, and G-PCC v23 methods, respectively. The encoding/decoding time costs are reduced up to 97.68%/98.78% on average compared to the Sparse-PCAC. The source code and the trained models of the TSC-PCAC are available at https://github.com/igizuxo/TSC-PCAC.
