GFreeDet: Exploiting Gaussian Splatting and Foundation Models for Model-free Unseen Object Detection in the BOP Challenge 2024
Xingyu Liu, Gu Wang, Chengxi Li, Yingyue Li, Chenyangguang Zhang, Ziqin Huang, Xiangyang Ji
TL;DR
The paper tackles model-free unseen object detection in open-world MR by learning unseen objects from short onboarding videos without CAD models. It introduces GFreeDet, which reconstructs a Gaussian object via Gaussian splatting from onboarding frames, renders 162 Gaussian-based templates ($N_ ext{T}=162$) and uses DINOv2 and SAM to perform zero-shot instance segmentation-based matching against test image proposals, with $K=5$ top-global matches and a local descriptor comparison. Key contributions include a unified Gaussian-based object reconstruction and template-rendering pipeline, plus a descriptor-based template matching framework that combines global and local features to yield amodal 2D detections, evaluated with learned metrics on the BOP-H3 benchmark. On HOT3D, HOPEv2, and HANDAL, GFreeDet achieves competitive $AP_{H3}$ (≈31.9%) with a fast variant (FastSAM) that delivers superior speed and won best overall and best fast method in the model-free 2D detection track, demonstrating the viability of model-free detection for mixed reality applications.
Abstract
We present GFreeDet, an unseen object detection approach that leverages Gaussian splatting and vision Foundation models under model-free setting. Unlike existing methods that rely on predefined CAD templates, GFreeDet reconstructs objects directly from reference videos using Gaussian splatting, enabling robust detection of novel objects without prior 3D models. Evaluated on the BOP-H3 benchmark, GFreeDet achieves comparable performance to CAD-based methods, demonstrating the viability of model-free detection for mixed reality (MR) applications. Notably, GFreeDet won the best overall method and the best fast method awards in the model-free 2D detection track at BOP Challenge 2024.
