Table of Contents
Fetching ...

Sparse Point Clouds Assisted Learned Image Compression

Yiheng Jiang, Haotian Zhang, Li Li, Dong Liu, Zhu Li

TL;DR

This work introduces sparse LiDAR point clouds as a cross-modal cue to boost learned image compression in autonomous driving. By projecting the 3D point cloud to a depth-like map and deriving dense structural features through Point-to-image Prediction (PIP) and Multi-scale Context Mining (MCM), the method can be integrated into existing learned codecs via a Hyper Refiner to improve rate-distortion performance. Across KITTI and Waymo, the approach yields notable BD-Rate reductions, with larger gains for simpler baselines and robust performance under lossy point-cloud conditions. The results demonstrate that inter-modality cues help preserve structural details and enhance reconstruction quality, suggesting practical benefits for multi-sensor autonomous systems.

Abstract

In the field of autonomous driving, a variety of sensor data types exist, each representing different modalities of the same scene. Therefore, it is feasible to utilize data from other sensors to facilitate image compression. However, few techniques have explored the potential benefits of utilizing inter-modality correlations to enhance the image compression performance. In this paper, motivated by the recent success of learned image compression, we propose a new framework that uses sparse point clouds to assist in learned image compression in the autonomous driving scenario. We first project the 3D sparse point cloud onto a 2D plane, resulting in a sparse depth map. Utilizing this depth map, we proceed to predict camera images. Subsequently, we use these predicted images to extract multi-scale structural features. These features are then incorporated into learned image compression pipeline as additional information to improve the compression performance. Our proposed framework is compatible with various mainstream learned image compression models, and we validate our approach using different existing image compression methods. The experimental results show that incorporating point cloud assistance into the compression pipeline consistently enhances the performance.

Sparse Point Clouds Assisted Learned Image Compression

TL;DR

This work introduces sparse LiDAR point clouds as a cross-modal cue to boost learned image compression in autonomous driving. By projecting the 3D point cloud to a depth-like map and deriving dense structural features through Point-to-image Prediction (PIP) and Multi-scale Context Mining (MCM), the method can be integrated into existing learned codecs via a Hyper Refiner to improve rate-distortion performance. Across KITTI and Waymo, the approach yields notable BD-Rate reductions, with larger gains for simpler baselines and robust performance under lossy point-cloud conditions. The results demonstrate that inter-modality cues help preserve structural details and enhance reconstruction quality, suggesting practical benefits for multi-sensor autonomous systems.

Abstract

In the field of autonomous driving, a variety of sensor data types exist, each representing different modalities of the same scene. Therefore, it is feasible to utilize data from other sensors to facilitate image compression. However, few techniques have explored the potential benefits of utilizing inter-modality correlations to enhance the image compression performance. In this paper, motivated by the recent success of learned image compression, we propose a new framework that uses sparse point clouds to assist in learned image compression in the autonomous driving scenario. We first project the 3D sparse point cloud onto a 2D plane, resulting in a sparse depth map. Utilizing this depth map, we proceed to predict camera images. Subsequently, we use these predicted images to extract multi-scale structural features. These features are then incorporated into learned image compression pipeline as additional information to improve the compression performance. Our proposed framework is compatible with various mainstream learned image compression models, and we validate our approach using different existing image compression methods. The experimental results show that incorporating point cloud assistance into the compression pipeline consistently enhances the performance.

Paper Structure

This paper contains 30 sections, 14 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Image compression performance after using point clouds. Using sparse LiDAR point clouds to assist various image compression methodsDBLP:conf/iclr/BalleMSHJ18he2022elicjiang2023mlic can improve BD-Rate performance on the KITTIGeiger2013IJRR dataset, with ELIChe2022elic as the anchor.
  • Figure 2: Overall architecture of our method. The blue layers in $g_{a}$ represent down-sampling, while in $g_{s}$, they represent up-sampling. The specific configuration of up/down-sample layers depends on the compression network used. The processing layers remain consistent with the original coding methods. For example, HYPERDBLP:conf/iclr/BalleMSHJ18 uses GDN/IGDNsDBLP:conf/iclr/BalleLS17 as processing layers, and ELIChe2022elic uses Res-blockshe2016deep or Attention-blocksvaswani2017attention.
  • Figure 3: Detailed structure of Multi-scale Context Mining (MCM) module. The Extraction Layer (EL) consists of one convolutional layer, one res-block, and one attention-block. The Fusion Layer (FL) consists of one convolutional layer, and one res-block.
  • Figure 4: (a) Detailed structure of the Point-to-image Prediction (PIP) module. (b) Detailed structure of the Hyper Refiner (HR) module.
  • Figure 5: (a) Directly predicting RGB values using point cloud is very challenging. (b) Predicting color-transformed images can learn a fine-grained structural feature.
  • ...and 10 more figures