Table of Contents
Fetching ...

3D Object Detection from Point Cloud via Voting Step Diffusion

Haoran Hou, Mingtao Feng, Zijie Wu, Weisheng Dong, Qing Zhu, Yaonan Wang, Ajmal Mian

TL;DR

This work proposes a new method to move random 3D points toward the high-density region of the distribution by estimating the score function of the distribution with a noise conditioned score network and forms the voting process as generating new points in the high-density region of the distribution of object centers.

Abstract

3D object detection is a fundamental task in scene understanding. Numerous research efforts have been dedicated to better incorporate Hough voting into the 3D object detection pipeline. However, due to the noisy, cluttered, and partial nature of real 3D scans, existing voting-based methods often receive votes from the partial surfaces of individual objects together with severe noises, leading to sub-optimal detection performance. In this work, we focus on the distributional properties of point clouds and formulate the voting process as generating new points in the high-density region of the distribution of object centers. To achieve this, we propose a new method to move random 3D points toward the high-density region of the distribution by estimating the score function of the distribution with a noise conditioned score network. Specifically, we first generate a set of object center proposals to coarsely identify the high-density region of the object center distribution. To estimate the score function, we perturb the generated object center proposals by adding normalized Gaussian noise, and then jointly estimate the score function of all perturbed distributions. Finally, we generate new votes by moving random 3D points to the high-density region of the object center distribution according to the estimated score function. Extensive experiments on two large scale indoor 3D scene datasets, SUN RGB-D and ScanNet V2, demonstrate the superiority of our proposed method. The code will be released at https://github.com/HHrEtvP/DiffVote.

3D Object Detection from Point Cloud via Voting Step Diffusion

TL;DR

This work proposes a new method to move random 3D points toward the high-density region of the distribution by estimating the score function of the distribution with a noise conditioned score network and forms the voting process as generating new points in the high-density region of the distribution of object centers.

Abstract

3D object detection is a fundamental task in scene understanding. Numerous research efforts have been dedicated to better incorporate Hough voting into the 3D object detection pipeline. However, due to the noisy, cluttered, and partial nature of real 3D scans, existing voting-based methods often receive votes from the partial surfaces of individual objects together with severe noises, leading to sub-optimal detection performance. In this work, we focus on the distributional properties of point clouds and formulate the voting process as generating new points in the high-density region of the distribution of object centers. To achieve this, we propose a new method to move random 3D points toward the high-density region of the distribution by estimating the score function of the distribution with a noise conditioned score network. Specifically, we first generate a set of object center proposals to coarsely identify the high-density region of the object center distribution. To estimate the score function, we perturb the generated object center proposals by adding normalized Gaussian noise, and then jointly estimate the score function of all perturbed distributions. Finally, we generate new votes by moving random 3D points to the high-density region of the object center distribution according to the estimated score function. Extensive experiments on two large scale indoor 3D scene datasets, SUN RGB-D and ScanNet V2, demonstrate the superiority of our proposed method. The code will be released at https://github.com/HHrEtvP/DiffVote.
Paper Structure (20 sections, 19 equations, 9 figures, 9 tables, 2 algorithms)

This paper contains 20 sections, 19 equations, 9 figures, 9 tables, 2 algorithms.

Figures (9)

  • Figure 1: The votes generated by VoteNet usually suffer from partial coverage of the object surfaces and outliers from the cluttered background and adjacent objects.
  • Figure 2: Illustration of our proposed method. We first estimate the score function of the distribution of object centers. Then, we perform gradient ascent to move points to the high-density region of the distribution.
  • Figure 3: Overview of our proposed method. We first utilize a PointNet++ backbone to extract the point-wise features and generate a set of object center proposals. Random noises are then added to the generated proposals to corrupt the data. We propose a multi-scale score estimation module to predict the added random noise while conditioned on the input point cloud. Finally, we perform gradient ascent to denoise the perturbed object center proposals and generate 3D bounding boxes in our score-aware object proposal module.
  • Figure 4: Instances of adding un-normalization noise. We can clearly see that the perturbed object center proposals in both instances fail to achieve uniform spatial coverage.
  • Figure 5: Comparing voting mechanism (Voting) to the proposed noise conditioned score network (NCSN) on the ScanNet V2 val set.
  • ...and 4 more figures