Table of Contents
Fetching ...

Autonomous Marker-less Rapid Aerial Grasping

Erik Bauer, Barnabas Gavin Cangan, Robert K. Katzschmann

TL;DR

This work proposes a vision-based system for autonomous rapid aerial grasping which does not rely on mark-ers for object localization and does not require the appearance of the object to be previously known, and shows the first use of geometry-based grasping techniques with a flying platform.

Abstract

In a future with autonomous robots, visual and spatial perception is of utmost importance for robotic systems. Particularly for aerial robotics, there are many applications where utilizing visual perception is necessary for any real-world scenarios. Robotic aerial grasping using drones promises fast pick-and-place solutions with a large increase in mobility over other robotic solutions. Utilizing Mask R-CNN scene segmentation (detectron2), we propose a vision-based system for autonomous rapid aerial grasping which does not rely on markers for object localization and does not require the appearance of the object to be previously known. Combining segmented images with spatial information from a depth camera, we generate a dense point cloud of the detected objects and perform geometry-based grasp planning to determine grasping points on the objects. In real-world experiments on a dynamically grasping aerial platform, we show that our system can replicate the performance of a motion capture system for object localization up to 94.5 % of the baseline grasping success rate. With our results, we show the first use of geometry-based grasping techniques with a flying platform and aim to increase the autonomy of existing aerial manipulation platforms, bringing them further towards real-world applications in warehouses and similar environments.

Autonomous Marker-less Rapid Aerial Grasping

TL;DR

This work proposes a vision-based system for autonomous rapid aerial grasping which does not rely on mark-ers for object localization and does not require the appearance of the object to be previously known, and shows the first use of geometry-based grasping techniques with a flying platform.

Abstract

In a future with autonomous robots, visual and spatial perception is of utmost importance for robotic systems. Particularly for aerial robotics, there are many applications where utilizing visual perception is necessary for any real-world scenarios. Robotic aerial grasping using drones promises fast pick-and-place solutions with a large increase in mobility over other robotic solutions. Utilizing Mask R-CNN scene segmentation (detectron2), we propose a vision-based system for autonomous rapid aerial grasping which does not rely on markers for object localization and does not require the appearance of the object to be previously known. Combining segmented images with spatial information from a depth camera, we generate a dense point cloud of the detected objects and perform geometry-based grasp planning to determine grasping points on the objects. In real-world experiments on a dynamically grasping aerial platform, we show that our system can replicate the performance of a motion capture system for object localization up to 94.5 % of the baseline grasping success rate. With our results, we show the first use of geometry-based grasping techniques with a flying platform and aim to increase the autonomy of existing aerial manipulation platforms, bringing them further towards real-world applications in warehouses and similar environments.
Paper Structure (34 sections, 10 figures, 1 table)

This paper contains 34 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: Our proposed real-time scene segmentation and geometry-based grasp planning enables rapid aerial grasping (3s swoop duration) to pick up a target object using its soft gripper. We eliminate the need for artificial markers on grasp targets and perform grasp planning using an extracted point cloud of the object.
  • Figure 2: The grasp planning pipeline. In a), we see the point cloud as it was created from an unmasked RGB frame and the corresponding depth frame. In b), we see a point cloud created from a RGB frame that is masked around the bottle in the center of the frame. Then, c) shows the full point cloud we get by removing all masked points and applying radius outlier removal. Finally, d) shows the downsampled point cloud fused with a copy of itself that is rotated around the main axis of the point cloud. Grasping candidates are highlighted in red, e) shows the same point cloud in full resolution for better illustration.
  • Figure 3: The dataflow for the vision system integrated into the RAPTOR system. Each block represents a single process. This architecture allows the vision system to be a drop-in replacement for the existing target object localization using a motion capture (MoCap) system.
  • Figure 4: Dataflow of the image streaming pipeline using hybrid compression scheme (lossy JPG for RGB images and lossless PNG for depth frames. Depth frames require lossless compression to preserve localization accuracy whereas for RGB frames, we can use computationally cheaper lossy JPG compression.
  • Figure 5: The transit time for a pair of one JPG-compressed RGB frame and one PNG-compressed depth frame sent over imagezmq, both with a resolution of 640 by 480 pixels. The mean transit time is 41ms.
  • ...and 5 more figures