Table of Contents
Fetching ...

Simplifying Two-Stage Detectors for On-Device Inference in Remote Sensing

Jaemin Kang, Hoeseok Yang, Hyungshin Kim

TL;DR

This work tackles on-device object detection for remote sensing by removing the feature pyramid from two-stage detectors and performing regression with a single feature. To offset accuracy loss, it optimizes anchor sizes to align with dataset object distributions and applies a 5×5 high-pass filter to the RPN score map to bolster small-object RoIs. The approach achieves substantial computational savings, up to $FLOPs$ reductions of 61.2% with modest $mAP$ degradation (≈2%), across state-of-the-art detectors like ReDet, Oriented-RCNN, and LSKNet on DOTAv1.5. While effective for real-time inference, the method introduces potential noise and is currently limited to two-stage detectors, suggesting future work to extend to one-stage detectors and further accuracy improvements with controlled compute overhead.

Abstract

Deep learning has been successfully applied to object detection from remotely sensed images. Images are typically processed on the ground rather than on-board due to the computation power of the ground system. Such offloaded processing causes delays in acquiring target mission information, which hinders its application to real-time use cases. For on-device object detection, researches have been conducted on designing efficient detectors or model compression to reduce inference latency. However, highly accurate two-stage detectors still need further exploitation for acceleration. In this paper, we propose a model simplification method for two-stage object detectors. Instead of constructing a general feature pyramid, we utilize only one feature extraction in the two-stage detector. To compensate for the accuracy drop, we apply a high pass filter to the RPN's score map. Our approach is applicable to any two-stage detector using a feature pyramid network. In the experiments with state-of-the-art two-stage detectors such as ReDet, Oriented-RCNN, and LSKNet, our method reduced computation costs upto 61.2% with the accuracy loss within 2.1% on the DOTAv1.5 dataset. Source code will be released.

Simplifying Two-Stage Detectors for On-Device Inference in Remote Sensing

TL;DR

This work tackles on-device object detection for remote sensing by removing the feature pyramid from two-stage detectors and performing regression with a single feature. To offset accuracy loss, it optimizes anchor sizes to align with dataset object distributions and applies a 5×5 high-pass filter to the RPN score map to bolster small-object RoIs. The approach achieves substantial computational savings, up to reductions of 61.2% with modest degradation (≈2%), across state-of-the-art detectors like ReDet, Oriented-RCNN, and LSKNet on DOTAv1.5. While effective for real-time inference, the method introduces potential noise and is currently limited to two-stage detectors, suggesting future work to extend to one-stage detectors and further accuracy improvements with controlled compute overhead.

Abstract

Deep learning has been successfully applied to object detection from remotely sensed images. Images are typically processed on the ground rather than on-board due to the computation power of the ground system. Such offloaded processing causes delays in acquiring target mission information, which hinders its application to real-time use cases. For on-device object detection, researches have been conducted on designing efficient detectors or model compression to reduce inference latency. However, highly accurate two-stage detectors still need further exploitation for acceleration. In this paper, we propose a model simplification method for two-stage object detectors. Instead of constructing a general feature pyramid, we utilize only one feature extraction in the two-stage detector. To compensate for the accuracy drop, we apply a high pass filter to the RPN's score map. Our approach is applicable to any two-stage detector using a feature pyramid network. In the experiments with state-of-the-art two-stage detectors such as ReDet, Oriented-RCNN, and LSKNet, our method reduced computation costs upto 61.2% with the accuracy loss within 2.1% on the DOTAv1.5 dataset. Source code will be released.
Paper Structure (13 sections, 7 figures, 6 tables)

This paper contains 13 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Computation workload analysis of a two-stage detector, LSKNet li2303large. The size of the input image is 1024 pixels by 1024 pixels. LSKNet-T is created by reducing the backbone of the LSTNet-S. Graph on the right shows our method is effective to the regression part of the detector.
  • Figure 2: A two-stage detector has been implemented using our approach. Layers needed for constructing the feature pyramid have been removed. Furthermore, because both RPN and RoIAlign utilize a single feature, operations that are not included in FLOPs are also reduced.
  • Figure 3: Ratio of matched anchors with objects in each feature. The graph shows proportion of objects in the validation dataset of DOTAv1.5 with an IoU of 0.5 or higher as the matched anchor. The left graph is the result from the original model with anchor sizes of 32, 64, 128, 256, and 512. The right graph is the result from the modified anchors of sizes 16, 32, 64, 128, and 256.
  • Figure 4: Visualization of the anchor's stride based on the downscale factor of the features when the image size is 256 x 256. The red dot represents the location where the anchors appear in the feature. To detect a car, an anchor size of $32^2$ is sufficient, however it is not trained on the P5 feature due to low IoU. In P3 and P4 features, the IoU between the anchor and object exceeds the threshold for positive anchors.
  • Figure 5: Classification score map after the picture has passed through the RPN in the detector. The score was measured in the Oriented R-CNN model. The score of graph is passed through a sigmoid function before extracting the RoI from the score.
  • ...and 2 more figures