Table of Contents
Fetching ...

LiLa-Net: Lightweight Latent LiDAR Autoencoder for 3D Point Cloud Reconstruction

Mario Resino, Borja Pérez, Jaime Godoy, Abdulla Al-Kaff, Fernando García

TL;DR

LiLa-Net addresses efficient reconstruction of 3D LiDAR point clouds for autonomous driving by learning a compact latent representation through a lightweight autoencoder operating directly on sparse points. The architecture uses a reduced encoder and a single innermost skip connection to balance latent information with skip features, optimizing a Chamfer Distance loss to reconstruct the input cloud. The authors demonstrate strong generalization to unseen objects and cross-domain data (e.g., ShapeNet) and show that the latent space supports competitive classification on ModelNet10/40 without pretraining, indicating transferability. Overall, LiLa-Net provides a practical, resource-efficient approach for real-time point-cloud reconstruction and downstream perception tasks across automotive and synthetic domains.

Abstract

This work proposed a 3D autoencoder architecture, named LiLa-Net, which encodes efficient features from real traffic environments, employing only the LiDAR's point clouds. For this purpose, we have real semi-autonomous vehicle, equipped with Velodyne LiDAR. The system leverage skip connections concept to improve the performance without using extensive resources as the state-of-the-art architectures. Key changes include reducing the number of encoder layers and simplifying the skip connections, while still producing an efficient and representative latent space which allows to accurately reconstruct the original point cloud. Furthermore, an effective balance has been achieved between the information carried by the skip connections and the latent encoding, leading to improved reconstruction quality without compromising performance. Finally, the model demonstrates strong generalization capabilities, successfully reconstructing objects unrelated to the original traffic environment.

LiLa-Net: Lightweight Latent LiDAR Autoencoder for 3D Point Cloud Reconstruction

TL;DR

LiLa-Net addresses efficient reconstruction of 3D LiDAR point clouds for autonomous driving by learning a compact latent representation through a lightweight autoencoder operating directly on sparse points. The architecture uses a reduced encoder and a single innermost skip connection to balance latent information with skip features, optimizing a Chamfer Distance loss to reconstruct the input cloud. The authors demonstrate strong generalization to unseen objects and cross-domain data (e.g., ShapeNet) and show that the latent space supports competitive classification on ModelNet10/40 without pretraining, indicating transferability. Overall, LiLa-Net provides a practical, resource-efficient approach for real-time point-cloud reconstruction and downstream perception tasks across automotive and synthetic domains.

Abstract

This work proposed a 3D autoencoder architecture, named LiLa-Net, which encodes efficient features from real traffic environments, employing only the LiDAR's point clouds. For this purpose, we have real semi-autonomous vehicle, equipped with Velodyne LiDAR. The system leverage skip connections concept to improve the performance without using extensive resources as the state-of-the-art architectures. Key changes include reducing the number of encoder layers and simplifying the skip connections, while still producing an efficient and representative latent space which allows to accurately reconstruct the original point cloud. Furthermore, an effective balance has been achieved between the information carried by the skip connections and the latent encoding, leading to improved reconstruction quality without compromising performance. Finally, the model demonstrates strong generalization capabilities, successfully reconstructing objects unrelated to the original traffic environment.

Paper Structure

This paper contains 20 sections, 3 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Proposed LiLa-Net Architecture: point clouds are first preprocessed ($F$). Then, multiple encoder layers ($E$) progressively reduce the information to the latent space dimension ($L$). For reconstruction, the latent representation is concatenated with the skip connection features ($S$) at the lowest level, followed by a series of decoder layers ($D$) to obtain the final reconstruction. The loss function is the Chamfer Distance between the reconstructed output and the preprocessed point cloud.
  • Figure 2: Comparison of $R$ under different $S$ configurations Using a fixed random latent space. The color gradient along the height axis highlights how the reconstruction worsens with the depth of $S$ in the network.
  • Figure 3: Evolution of Chamfer Distance ($CD$) and Earth Mover’s Distance ($EMD$) with varying training dataset size. Each point is based on 50 independent trainings per dataset size (500 in total). Line plots represent mean values (including outliers), while box plots depict the median and interquartile range.
  • Figure 4: Reconstructed Point Clouds ($R_{3\times M}$) of the Same Scene Using Varying Input Cloud Sizes ($M$).