Table of Contents
Fetching ...

OpenNav: Efficient Open Vocabulary 3D Object Detection for Smart Wheelchair Navigation

Muhammad Rameez ur Rahman, Piero Simonetto, Anna Polato, Francesco Pasti, Luca Tonin, Sebastiano Vascon

TL;DR

OpenNav introduces a zero-shot 3D object detection pipeline for smart wheelchair navigation that fuses an open-vocabulary 2D detector with a mask generator, depth isolation, and per-object point-cloud reconstruction to produce 3D bounding boxes from RGB-D data. The approach achieves state-of-the-art mAP on the Replica dataset, significantly outperforming competitors at relaxed IoU thresholds, and demonstrates preliminary real-world validation on a semi-autonomous wheelchair within ROS2. Key contributions include a lightweight, open-vocabulary driven 3D perception pipeline that does not require full-scene point-cloud storage, plus a practical integration for assistive navigation with promising runtime efficiency. The work highlights the practical potential of leveraging pretrained open-vocabulary models for flexible, map-free target discovery in dynamic environments, while acknowledging dependence on segmentation/detection quality and single-instance assumptions, with future work aimed at handling multiple objects per class.

Abstract

Open vocabulary 3D object detection (OV3D) allows precise and extensible object recognition crucial for adapting to diverse environments encountered in assistive robotics. This paper presents OpenNav, a zero-shot 3D object detection pipeline based on RGB-D images for smart wheelchairs. Our pipeline integrates an open-vocabulary 2D object detector with a mask generator for semantic segmentation, followed by depth isolation and point cloud construction to create 3D bounding boxes. The smart wheelchair exploits these 3D bounding boxes to identify potential targets and navigate safely. We demonstrate OpenNav's performance through experiments on the Replica dataset and we report preliminary results with a real wheelchair. OpenNav improves state-of-the-art significantly on the Replica dataset at mAP25 (+9pts) and mAP50 (+5pts) with marginal improvement at mAP. The code is publicly available at this link: https://github.com/EasyWalk-PRIN/OpenNav.

OpenNav: Efficient Open Vocabulary 3D Object Detection for Smart Wheelchair Navigation

TL;DR

OpenNav introduces a zero-shot 3D object detection pipeline for smart wheelchair navigation that fuses an open-vocabulary 2D detector with a mask generator, depth isolation, and per-object point-cloud reconstruction to produce 3D bounding boxes from RGB-D data. The approach achieves state-of-the-art mAP on the Replica dataset, significantly outperforming competitors at relaxed IoU thresholds, and demonstrates preliminary real-world validation on a semi-autonomous wheelchair within ROS2. Key contributions include a lightweight, open-vocabulary driven 3D perception pipeline that does not require full-scene point-cloud storage, plus a practical integration for assistive navigation with promising runtime efficiency. The work highlights the practical potential of leveraging pretrained open-vocabulary models for flexible, map-free target discovery in dynamic environments, while acknowledging dependence on segmentation/detection quality and single-instance assumptions, with future work aimed at handling multiple objects per class.

Abstract

Open vocabulary 3D object detection (OV3D) allows precise and extensible object recognition crucial for adapting to diverse environments encountered in assistive robotics. This paper presents OpenNav, a zero-shot 3D object detection pipeline based on RGB-D images for smart wheelchairs. Our pipeline integrates an open-vocabulary 2D object detector with a mask generator for semantic segmentation, followed by depth isolation and point cloud construction to create 3D bounding boxes. The smart wheelchair exploits these 3D bounding boxes to identify potential targets and navigate safely. We demonstrate OpenNav's performance through experiments on the Replica dataset and we report preliminary results with a real wheelchair. OpenNav improves state-of-the-art significantly on the Replica dataset at mAP25 (+9pts) and mAP50 (+5pts) with marginal improvement at mAP. The code is publicly available at this link: https://github.com/EasyWalk-PRIN/OpenNav.
Paper Structure (28 sections, 10 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 28 sections, 10 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: A wheelchair user inputs a desired object through a prompt. A sophisticated 3D object recognition system then scans the environment to locate the specified object. Once identified, the wheelchair navigates smoothly towards the chosen target.
  • Figure 2: Open vocabulary 3D Object detection and reconstruction pipeline. Our pipeline works by using input from RGB-D camera. RGB image and textual description of desired class is fed into 2D object detector to generate 2D bounding boxes. Then, the mask generator generates a semantic mask for objects in bounding boxes. The masks are filtered, and then the depth is isolated based on masks to reconstruct each object's point cloud. Lastly, we create 3D bounding boxes around the point cloud of each object.
  • Figure 3: From RGBD to 3D bounding box generation using input from Realsene camera.
  • Figure 4: Instance segmentation results on Replica
  • Figure 5: Failure case. In the case of large objects that are not entirely visible in a single view, they might not be well detected. This happens, for example for the pillow (red circle). Therefore, such objects get broken into different segments in the subsequent phases resulting in different instances.
  • ...and 1 more figures