Table of Contents
Fetching ...

Deep Visual Servoing of an Aerial Robot Using Keypoint Feature Extraction

Shayan Sepahvand, Niloufar Amiri, Farrokh Janabi-Sharifi

TL;DR

This work tackles markerless IBVS for UAVs by leveraging a CNN to predict four corner keypoints of a target object, enabling robust image-based control without markers. It integrates a formal IBVS controller with the image Jacobian through the pseudo-inverse law $\mathbf{v}_c = \lambda \mathbf{L}^\dagger \mathbf{e}$, and maps the resulting twist into the UAV frame using the adjoint transformation. The CNN is built on a VGG-19 backbone trained with MAE loss, using transfer learning and a four-corner regression head, and is evaluated in physics-based ROS Gazebo simulations to assess robustness to occlusion, illumination, clutter, and background changes. The approach achieves real-time performance (CNN 60–130 ms, 30 Hz data rate) and demonstrates marker-free pose feedback capability, broadening the applicability of perception-guided motion control for aerial robotics, while identifying background variation as an area for future improvement.

Abstract

The problem of image-based visual servoing (IBVS) of an aerial robot using deep-learning-based keypoint detection is addressed in this article. A monocular RGB camera mounted on the platform is utilized to collect the visual data. A convolutional neural network (CNN) is then employed to extract the features serving as the visual data for the servoing task. This paper contributes to the field by circumventing not only the challenge stemming from the need for man-made marker detection in conventional visual servoing techniques, but also enhancing the robustness against undesirable factors including occlusion, varying illumination, clutter, and background changes, thereby broadening the applicability of perception-guided motion control tasks in aerial robots. Additionally, extensive physics-based ROS Gazebo simulations are conducted to assess the effectiveness of this method, in contrast to many existing studies that rely solely on physics-less simulations. A demonstration video is available at https://youtu.be/Dd2Her8Ly-E.

Deep Visual Servoing of an Aerial Robot Using Keypoint Feature Extraction

TL;DR

This work tackles markerless IBVS for UAVs by leveraging a CNN to predict four corner keypoints of a target object, enabling robust image-based control without markers. It integrates a formal IBVS controller with the image Jacobian through the pseudo-inverse law , and maps the resulting twist into the UAV frame using the adjoint transformation. The CNN is built on a VGG-19 backbone trained with MAE loss, using transfer learning and a four-corner regression head, and is evaluated in physics-based ROS Gazebo simulations to assess robustness to occlusion, illumination, clutter, and background changes. The approach achieves real-time performance (CNN 60–130 ms, 30 Hz data rate) and demonstrates marker-free pose feedback capability, broadening the applicability of perception-guided motion control for aerial robotics, while identifying background variation as an area for future improvement.

Abstract

The problem of image-based visual servoing (IBVS) of an aerial robot using deep-learning-based keypoint detection is addressed in this article. A monocular RGB camera mounted on the platform is utilized to collect the visual data. A convolutional neural network (CNN) is then employed to extract the features serving as the visual data for the servoing task. This paper contributes to the field by circumventing not only the challenge stemming from the need for man-made marker detection in conventional visual servoing techniques, but also enhancing the robustness against undesirable factors including occlusion, varying illumination, clutter, and background changes, thereby broadening the applicability of perception-guided motion control tasks in aerial robots. Additionally, extensive physics-based ROS Gazebo simulations are conducted to assess the effectiveness of this method, in contrast to many existing studies that rely solely on physics-less simulations. A demonstration video is available at https://youtu.be/Dd2Her8Ly-E.

Paper Structure

This paper contains 9 sections, 7 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: The mounted camera on the OpenMANIPULATOR-X with an eye-in-hand configuration for data collection
  • Figure 2: Sample of the images of the dataset.
  • Figure 3: Learning curves showing how the training and validation loss functions change for two models.
  • Figure 4: Various worlds created in Gazebo were utilized to carry out robustness tests.
  • Figure 5: The performance of the controller in the absence of undesirable factors
  • ...and 4 more figures