Table of Contents
Fetching ...

Keypoint Detection Technique for Image-Based Visual Servoing of Manipulators

Niloufar Amiri, Guanghui Wang, Farrokh Janabi-Sharifi

Abstract

This paper introduces an innovative keypoint detection technique based on Convolutional Neural Networks (CNNs) to enhance the performance of existing Deep Visual Servoing (DVS) models. To validate the convergence of the Image-Based Visual Servoing (IBVS) algorithm, real-world experiments utilizing fiducial markers for feature detection are conducted before designing the CNN-based feature detector. To address the limitations of fiducial markers, the novel feature detector focuses on extracting keypoints that represent the corners of a more realistic object compared to fiducial markers. A dataset is generated from sample data captured by the camera mounted on the robot end-effector while the robot operates randomly in the task space. The samples are automatically labeled, and the dataset size is increased by flipping and rotation. The CNN model is developed by modifying the VGG-19 pre-trained on the ImageNet dataset. While the weights in the base model remain fixed, the fully connected layer's weights are updated to minimize the mean absolute error, defined based on the deviation of predictions from the real pixel coordinates of the corners. The model undergoes two modifications: replacing max-pooling with average-pooling in the base model and implementing an adaptive learning rate that decreases during epochs. These changes lead to a 50 percent reduction in validation loss. Finally, the trained model's reliability is assessed through k-fold cross-validation.

Keypoint Detection Technique for Image-Based Visual Servoing of Manipulators

Abstract

This paper introduces an innovative keypoint detection technique based on Convolutional Neural Networks (CNNs) to enhance the performance of existing Deep Visual Servoing (DVS) models. To validate the convergence of the Image-Based Visual Servoing (IBVS) algorithm, real-world experiments utilizing fiducial markers for feature detection are conducted before designing the CNN-based feature detector. To address the limitations of fiducial markers, the novel feature detector focuses on extracting keypoints that represent the corners of a more realistic object compared to fiducial markers. A dataset is generated from sample data captured by the camera mounted on the robot end-effector while the robot operates randomly in the task space. The samples are automatically labeled, and the dataset size is increased by flipping and rotation. The CNN model is developed by modifying the VGG-19 pre-trained on the ImageNet dataset. While the weights in the base model remain fixed, the fully connected layer's weights are updated to minimize the mean absolute error, defined based on the deviation of predictions from the real pixel coordinates of the corners. The model undergoes two modifications: replacing max-pooling with average-pooling in the base model and implementing an adaptive learning rate that decreases during epochs. These changes lead to a 50 percent reduction in validation loss. Finally, the trained model's reliability is assessed through k-fold cross-validation.
Paper Structure (13 sections, 3 equations, 10 figures, 3 tables)

This paper contains 13 sections, 3 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Schematic of joints and links
  • Figure 2: The mounted camera on the robot with an eye-in-hand configuration
  • Figure 3: The change in pixel coordinates of the Keypoints over time, from the initial position (black quadrilateral) to the desired position (red quadrilateral)
  • Figure 4: Second norm of the pixel coordinate error for four keypoints
  • Figure 5: Examples of annotated images created by conventional image processing techniques
  • ...and 5 more figures