Simplifying Two-Stage Detectors for On-Device Inference in Remote Sensing
Jaemin Kang, Hoeseok Yang, Hyungshin Kim
TL;DR
This work tackles on-device object detection for remote sensing by removing the feature pyramid from two-stage detectors and performing regression with a single feature. To offset accuracy loss, it optimizes anchor sizes to align with dataset object distributions and applies a 5×5 high-pass filter to the RPN score map to bolster small-object RoIs. The approach achieves substantial computational savings, up to $FLOPs$ reductions of 61.2% with modest $mAP$ degradation (≈2%), across state-of-the-art detectors like ReDet, Oriented-RCNN, and LSKNet on DOTAv1.5. While effective for real-time inference, the method introduces potential noise and is currently limited to two-stage detectors, suggesting future work to extend to one-stage detectors and further accuracy improvements with controlled compute overhead.
Abstract
Deep learning has been successfully applied to object detection from remotely sensed images. Images are typically processed on the ground rather than on-board due to the computation power of the ground system. Such offloaded processing causes delays in acquiring target mission information, which hinders its application to real-time use cases. For on-device object detection, researches have been conducted on designing efficient detectors or model compression to reduce inference latency. However, highly accurate two-stage detectors still need further exploitation for acceleration. In this paper, we propose a model simplification method for two-stage object detectors. Instead of constructing a general feature pyramid, we utilize only one feature extraction in the two-stage detector. To compensate for the accuracy drop, we apply a high pass filter to the RPN's score map. Our approach is applicable to any two-stage detector using a feature pyramid network. In the experiments with state-of-the-art two-stage detectors such as ReDet, Oriented-RCNN, and LSKNet, our method reduced computation costs upto 61.2% with the accuracy loss within 2.1% on the DOTAv1.5 dataset. Source code will be released.
