Table of Contents
Fetching ...

High-Speed Detector For Low-Powered Devices In Aerial Grasping

Ashish Kumar, Laxmidhar Behera

TL;DR

Fast Fruit Detector (FFD), a resource-efficient, single-stage, and postprocessing-free object detector based on the authors' novel latent object representation module, query assignment, and prediction strategy, is presented.

Abstract

Autonomous aerial harvesting is a highly complex problem because it requires numerous interdisciplinary algorithms to be executed on mini low-powered computing devices. Object detection is one such algorithm that is compute-hungry. In this context, we make the following contributions: (i) Fast Fruit Detector (FFD), a resource-efficient, single-stage, and postprocessing-free object detector based on our novel latent object representation (LOR) module, query assignment, and prediction strategy. FFD achieves 100FPS@FP32 precision on the latest 10W NVIDIA Jetson-NX embedded device while co-existing with other time-critical sub-systems such as control, grasping, SLAM, a major achievement of this work. (ii) a method to generate vast amounts of training data without exhaustive manual labelling of fruit images since they consist of a large number of instances, which increases the labelling cost and time. (iii) an open-source fruit detection dataset having plenty of very small-sized instances that are difficult to detect. Our exhaustive evaluations on our and MinneApple dataset show that FFD, being only a single-scale detector, is more accurate than many representative detectors, e.g. FFD is better than single-scale Faster-RCNN by 10.7AP, multi-scale Faster-RCNN by 2.3AP, and better than latest single-scale YOLO-v8 by 8AP and multi-scale YOLO-v8 by 0.3 while being considerably faster.

High-Speed Detector For Low-Powered Devices In Aerial Grasping

TL;DR

Fast Fruit Detector (FFD), a resource-efficient, single-stage, and postprocessing-free object detector based on the authors' novel latent object representation module, query assignment, and prediction strategy, is presented.

Abstract

Autonomous aerial harvesting is a highly complex problem because it requires numerous interdisciplinary algorithms to be executed on mini low-powered computing devices. Object detection is one such algorithm that is compute-hungry. In this context, we make the following contributions: (i) Fast Fruit Detector (FFD), a resource-efficient, single-stage, and postprocessing-free object detector based on our novel latent object representation (LOR) module, query assignment, and prediction strategy. FFD achieves 100FPS@FP32 precision on the latest 10W NVIDIA Jetson-NX embedded device while co-existing with other time-critical sub-systems such as control, grasping, SLAM, a major achievement of this work. (ii) a method to generate vast amounts of training data without exhaustive manual labelling of fruit images since they consist of a large number of instances, which increases the labelling cost and time. (iii) an open-source fruit detection dataset having plenty of very small-sized instances that are difficult to detect. Our exhaustive evaluations on our and MinneApple dataset show that FFD, being only a single-scale detector, is more accurate than many representative detectors, e.g. FFD is better than single-scale Faster-RCNN by 10.7AP, multi-scale Faster-RCNN by 2.3AP, and better than latest single-scale YOLO-v8 by 8AP and multi-scale YOLO-v8 by 0.3 while being considerably faster.
Paper Structure (39 sections, 11 equations, 9 figures, 8 tables)

This paper contains 39 sections, 11 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Top: Our aerial grasping system for fruit harvesting, and outdoor detection. Bottom: FFD has low training time, high inference speed, and high detection accuracy compared to existing detectors.
  • Figure 2: (a) Faster-RCNN, (b) SSD, (c) DETR, and (d) FFD.
  • Figure 3: Fast-Fruit-Detector (FFD). "GP": Global Pooling,'E': Expand, 'S': Squeeze, and '' Broadcast multiplication.
  • Figure 4: FFD has novel query assignment. In traditional detectors fasterrcnnssd, a query is simply is an anchor. '$t_{ij}$' denotes a tile in the image.
  • Figure 5: Differences between FFD and DETR-like methods.
  • ...and 4 more figures