Table of Contents
Fetching ...

Point Cloud-Assisted Neural Image Compression

Ziqun Li, Qi Zhang, Xiaofeng Huang, Zhao Wang, Siwei Ma, Wei Yan

TL;DR

The digital representation of image and point cloud is unified, and the point cloud-assisted neural image codec (PCA-NIC) is proposed to enhance the preservation of image texture and structure by utilizing the high-dimensional point cloud information.

Abstract

High-efficient image compression is a critical requirement. In several scenarios where multiple modalities of data are captured by different sensors, the auxiliary information from other modalities are not fully leveraged by existing image-only codecs, leading to suboptimal compression efficiency. In this paper, we increase image compression performance with the assistance of point cloud, which is widely adopted in the area of autonomous driving. We first unify the data representation for both modalities to facilitate data processing. Then, we propose the point cloud-assisted neural image codec (PCA-NIC) to enhance the preservation of image texture and structure by utilizing the high-dimensional point cloud information. We further introduce a multi-modal feature fusion transform module (MMFFT) to capture more representative image features, remove redundant information between channels and modalities that are not relevant to the image content. Our work is the first to improve image compression performance using point cloud and achieves state-of-the-art performance.

Point Cloud-Assisted Neural Image Compression

TL;DR

The digital representation of image and point cloud is unified, and the point cloud-assisted neural image codec (PCA-NIC) is proposed to enhance the preservation of image texture and structure by utilizing the high-dimensional point cloud information.

Abstract

High-efficient image compression is a critical requirement. In several scenarios where multiple modalities of data are captured by different sensors, the auxiliary information from other modalities are not fully leveraged by existing image-only codecs, leading to suboptimal compression efficiency. In this paper, we increase image compression performance with the assistance of point cloud, which is widely adopted in the area of autonomous driving. We first unify the data representation for both modalities to facilitate data processing. Then, we propose the point cloud-assisted neural image codec (PCA-NIC) to enhance the preservation of image texture and structure by utilizing the high-dimensional point cloud information. We further introduce a multi-modal feature fusion transform module (MMFFT) to capture more representative image features, remove redundant information between channels and modalities that are not relevant to the image content. Our work is the first to improve image compression performance using point cloud and achieves state-of-the-art performance.

Paper Structure

This paper contains 12 sections, 3 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Assuming all features of the image are $I$, the image features that can be extracted by the neural network are $I_y$, and the image features that cannot be extracted by the neural network are $I_n$. Similarly, regarding point cloud features, we have $P$, $P_y$, and $P_n$. By using neural networks to simultaneously extract features from both image and point cloud, the total mixed feature that can be extracted is $M_y$, the shared feature between the extracted image and point cloud is $IP_y$, the feature that only belongs to the image is $I_{yi}$, and the feature that only belongs to the point cloud is $P_{yi}$. The features belonging to the image in the mixed feature $M_y$ extracted by the neural network and the point cloud are denoted as $M_{img}$, while the features not belonging to the image are denoted as $\overline{M_{img}}$.
  • Figure 2: The unified representation of image and point cloud. The projection is to transform $P(X_w,Y_w,Z_w)$ into $p(u,v)$.
  • Figure 3: The overall architecture of PCA-NIC. $\downarrow$ means down-sampling. $\uparrow$ means up-sampling. RS is residual network. Attn is the attention module of Cheng20. N and M are channels, where N and M is equal to 192 and 288, respectively.
  • Figure 4: The left part is the diagram of MMFFT, and the right is the frature fusion transform without attention mechanisms (MMFFT_no_attn).
  • Figure 5: PSNR-Bit-rate curve and MS-SSIM-Bit-rate curve.
  • ...and 1 more figures