Table of Contents
Fetching ...

AVS-Net: Point Sampling with Adaptive Voxel Size for 3D Scene Understanding

Hongcheng Yang, Dingkang Liang, Dingyuan Zhang, Zhe Liu, Zhikang Zou, Xingyu Jiang, Yingying Zhu

TL;DR

AVS-Net tackles the efficiency-accuracy gap in large-scale 3D point cloud understanding by introducing Voxel Adaptation Module (VAM), which automatically adjusts voxel sizes to match a target downsampling ratio, enabling arbitrary per-layer voxel sizes. Built around Voxel Set Abstraction (VSA) with Intra-VFE and Inter-VFE, AVS-Net preserves fine geometric details while expanding receptive fields through voxel-based neighbor aggregation, all within a dynamic grouping framework. The approach is validated on 3D object detection and semantic/part segmentation across Waymo, ScanNet, and ShapeNetPart, showing superior accuracy and competitive latency compared with state-of-the-art voxel- and point-based methods. The PI-controlled VAM demonstrates convergence of sampling ratios, and ablations confirm the contributions of VAM, Inter-VFE, and the VSA design to overall gains. This work offers a scalable, geometry-preserving sampling paradigm for real-world autonomous systems and large-scale 3D perception.

Abstract

The recent advancements in point cloud learning have enabled intelligent vehicles and robots to comprehend 3D environments better. However, processing large-scale 3D scenes remains a challenging problem, such that efficient downsampling methods play a crucial role in point cloud learning. Existing downsampling methods either require a huge computational burden or sacrifice fine-grained geometric information. For such purpose, this paper presents an advanced sampler that achieves both high accuracy and efficiency. The proposed method utilizes voxel centroid sampling as a foundation but effectively addresses the challenges regarding voxel size determination and the preservation of critical geometric cues. Specifically, we propose a Voxel Adaptation Module that adaptively adjusts voxel sizes with the reference of point-based downsampling ratio. This ensures that the sampling results exhibit a favorable distribution for comprehending various 3D objects or scenes. Meanwhile, we introduce a network compatible with arbitrary voxel sizes for sampling and feature extraction while maintaining high efficiency. The proposed approach is demonstrated with 3D object detection and 3D semantic segmentation. Compared to existing state-of-the-art methods, our approach achieves better accuracy on outdoor and indoor large-scale datasets, e.g. Waymo and ScanNet, with promising efficiency.

AVS-Net: Point Sampling with Adaptive Voxel Size for 3D Scene Understanding

TL;DR

AVS-Net tackles the efficiency-accuracy gap in large-scale 3D point cloud understanding by introducing Voxel Adaptation Module (VAM), which automatically adjusts voxel sizes to match a target downsampling ratio, enabling arbitrary per-layer voxel sizes. Built around Voxel Set Abstraction (VSA) with Intra-VFE and Inter-VFE, AVS-Net preserves fine geometric details while expanding receptive fields through voxel-based neighbor aggregation, all within a dynamic grouping framework. The approach is validated on 3D object detection and semantic/part segmentation across Waymo, ScanNet, and ShapeNetPart, showing superior accuracy and competitive latency compared with state-of-the-art voxel- and point-based methods. The PI-controlled VAM demonstrates convergence of sampling ratios, and ablations confirm the contributions of VAM, Inter-VFE, and the VSA design to overall gains. This work offers a scalable, geometry-preserving sampling paradigm for real-world autonomous systems and large-scale 3D perception.

Abstract

The recent advancements in point cloud learning have enabled intelligent vehicles and robots to comprehend 3D environments better. However, processing large-scale 3D scenes remains a challenging problem, such that efficient downsampling methods play a crucial role in point cloud learning. Existing downsampling methods either require a huge computational burden or sacrifice fine-grained geometric information. For such purpose, this paper presents an advanced sampler that achieves both high accuracy and efficiency. The proposed method utilizes voxel centroid sampling as a foundation but effectively addresses the challenges regarding voxel size determination and the preservation of critical geometric cues. Specifically, we propose a Voxel Adaptation Module that adaptively adjusts voxel sizes with the reference of point-based downsampling ratio. This ensures that the sampling results exhibit a favorable distribution for comprehending various 3D objects or scenes. Meanwhile, we introduce a network compatible with arbitrary voxel sizes for sampling and feature extraction while maintaining high efficiency. The proposed approach is demonstrated with 3D object detection and 3D semantic segmentation. Compared to existing state-of-the-art methods, our approach achieves better accuracy on outdoor and indoor large-scale datasets, e.g. Waymo and ScanNet, with promising efficiency.
Paper Structure (15 sections, 12 equations, 7 figures, 12 tables, 3 algorithms)

This paper contains 15 sections, 12 equations, 7 figures, 12 tables, 3 algorithms.

Figures (7)

  • Figure 1: (a) Farthest point sampling with fixed ratio of point number, N means point number. (b) Voxel sampling with fixed ratio of voxel size, M means voxel number. (c) Point sampling with adaptive ratio of voxel size. (d) Proportion of sampled points in different layers. (e) Evaluation on ScanNet validation set.
  • Figure 2: Schematic diagram of the Voxel Set Abstraction module
  • Figure 3: Scatter and gather operations
  • Figure 4: 2D example of Inter-voxel Query. (a) The spatial distribution of sampled points. (b) (0,0)-(2,2) represents the 2D coordinates of voxels, 0-8 are flattened vc_1d values, and 0,2,7 are empty voxels. (c) Hash table between vc_1d and voxel_index. (d) Neighbor indices and corresponding $group\_id$.
  • Figure 5: Voxel Adaptation Module via the PI control algorithm
  • ...and 2 more figures