Diff3DETR:Agent-based Diffusion Model for Semi-supervised 3D Object Detection
Jiacheng Deng, Jiahao Lu, Tianzhu Zhang
TL;DR
Diff3DETR tackles semi-supervised 3D object detection by integrating diffusion-based pseudo-label generation into a DETR framework under a mean-teacher setup. It introduces an agent-based object query generator to balance sampling locations and content embeddings, and a box-aware denoising module that leverages DDIM denoising and long-range transformer attention to progressively refine noisy boxes. The approach yields diverse and high-quality pseudo-labels and improved bounding box accuracy, outperforming state-of-the-art methods on ScanNet and SUN RGB-D with limited labeled data. This work demonstrates that diffusion-based DETR architectures can effectively leverage unlabeled 3D data for robust scene understanding with practical impact on indoor robotics and AR/VR.
Abstract
3D object detection is essential for understanding 3D scenes. Contemporary techniques often require extensive annotated training data, yet obtaining point-wise annotations for point clouds is time-consuming and laborious. Recent developments in semi-supervised methods seek to mitigate this problem by employing a teacher-student framework to generate pseudo-labels for unlabeled point clouds. However, these pseudo-labels frequently suffer from insufficient diversity and inferior quality. To overcome these hurdles, we introduce an Agent-based Diffusion Model for Semi-supervised 3D Object Detection (Diff3DETR). Specifically, an agent-based object query generator is designed to produce object queries that effectively adapt to dynamic scenes while striking a balance between sampling locations and content embedding. Additionally, a box-aware denoising module utilizes the DDIM denoising process and the long-range attention in the transformer decoder to refine bounding boxes incrementally. Extensive experiments on ScanNet and SUN RGB-D datasets demonstrate that Diff3DETR outperforms state-of-the-art semi-supervised 3D object detection methods.
