Table of Contents
Fetching ...

Investigating Long-term Training for Remote Sensing Object Detection

JongHyun Park, Yechan Kim, Moongu Jeon

TL;DR

This work tackles the challenge of long-term training for remote sensing object detection, where preserving generic low-level features from pre-training while acquiring domain-specific knowledge is crucial. It introduces Dynamic Backbone Freezing (DBF), a simple, framework-agnostic approach that alternates backbone freezing using a Freezing Scheduler to balance generalization and specialization, with a Step Freezing Scheduler governed by a hyperparameter $\rho$. Across DOTA and DIOR-R, DBF yields higher $mAP$ and substantial reductions in training cost ($F_{total}$) compared to full training or frozen baselines, and demonstrates strong compatibility with both CNN and transformer backbones, including OrientedFormer. The method is practical for large-scale remote sensing pipelines and offers avenues for extension to other tasks such as segmentation and change detection, as well as more adaptive schedulers.

Abstract

Recently, numerous methods have achieved impressive performance in remote sensing object detection, relying on convolution or transformer architectures. Such detectors typically have a feature backbone to extract useful features from raw input images. A common practice in current detectors is initializing the backbone with pre-trained weights available online. Fine-tuning the backbone is typically required to generate features suitable for remote-sensing images. While the prolonged training could lead to over-fitting, hindering the extraction of basic visual features, it can enable models to gradually extract deeper insights and richer representations from remote sensing data. Striking a balance between these competing factors is critical for achieving optimal performance. In this study, we aim to investigate the performance and characteristics of remote sensing object detection models under very long training schedules, and propose a novel method named Dynamic Backbone Freezing (DBF) for feature backbone fine-tuning on remote sensing object detection under long-term training. Our method addresses the dilemma of whether the backbone should extract low-level generic features or possess specific knowledge of the remote sensing domain, by introducing a module called 'Freezing Scheduler' to manage the update of backbone features during long-term training dynamically. Extensive experiments on DOTA and DIOR-R show that our approach enables more accurate model learning while substantially reducing computational costs in long-term training. Besides, it can be seamlessly adopted without additional effort due to its straightforward design. The code is available at https://github.com/unique-chan/dbf.

Investigating Long-term Training for Remote Sensing Object Detection

TL;DR

This work tackles the challenge of long-term training for remote sensing object detection, where preserving generic low-level features from pre-training while acquiring domain-specific knowledge is crucial. It introduces Dynamic Backbone Freezing (DBF), a simple, framework-agnostic approach that alternates backbone freezing using a Freezing Scheduler to balance generalization and specialization, with a Step Freezing Scheduler governed by a hyperparameter . Across DOTA and DIOR-R, DBF yields higher and substantial reductions in training cost () compared to full training or frozen baselines, and demonstrates strong compatibility with both CNN and transformer backbones, including OrientedFormer. The method is practical for large-scale remote sensing pipelines and offers avenues for extension to other tasks such as segmentation and change detection, as well as more adaptive schedulers.

Abstract

Recently, numerous methods have achieved impressive performance in remote sensing object detection, relying on convolution or transformer architectures. Such detectors typically have a feature backbone to extract useful features from raw input images. A common practice in current detectors is initializing the backbone with pre-trained weights available online. Fine-tuning the backbone is typically required to generate features suitable for remote-sensing images. While the prolonged training could lead to over-fitting, hindering the extraction of basic visual features, it can enable models to gradually extract deeper insights and richer representations from remote sensing data. Striking a balance between these competing factors is critical for achieving optimal performance. In this study, we aim to investigate the performance and characteristics of remote sensing object detection models under very long training schedules, and propose a novel method named Dynamic Backbone Freezing (DBF) for feature backbone fine-tuning on remote sensing object detection under long-term training. Our method addresses the dilemma of whether the backbone should extract low-level generic features or possess specific knowledge of the remote sensing domain, by introducing a module called 'Freezing Scheduler' to manage the update of backbone features during long-term training dynamically. Extensive experiments on DOTA and DIOR-R show that our approach enables more accurate model learning while substantially reducing computational costs in long-term training. Besides, it can be seamlessly adopted without additional effort due to its straightforward design. The code is available at https://github.com/unique-chan/dbf.
Paper Structure (20 sections, 4 equations, 5 figures, 5 tables)

This paper contains 20 sections, 4 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Illustration of training remote sensing object detectors with the proposed framework named Dynamic Backbone Freezing (DBF). DBF allows the backbone to receive knowledge from the downstream data when Freezing Scheduler sends a signal to open the backward route. This ensures robust prediction by preventing over-fitting while also significantly saving training costs.
  • Figure 2: Impact of the proposed method on remote sensing object detection. The above visualization indicates the performance of DOTA and DIOR-R benchmark datasets using models using FCOS with ResNet-50. Under long-term training, our method achieves the highest prediction accuracy (mAP) while significantly reducing computational costs (FLOPs), compared to existing methods. Besides FCOS, Faster R-CNN and RetinaNet also show similar trends.
  • Figure 3: Visual illustration of long-term training strategies for remote sensing object detection over 400 epochs: (a) Full Training, (b) Frozen Backbone, (c) Ours (DBF).
  • Figure 4: Learning curves on DOTA and DIOR-R with Faster R-CNN and Swin-S: It can be seen that while our DBF shows slower training loss reduction than 'Full training', it shows stable and highest validation mAP values. Particularly, the validation mAP consistently shows an upward trend when using our method under long-term training scenarios. In the case of 'Frozen Backbone', while it incurs the lowest training cost, there is a considerable degradation in model performance compared to other approaches.
  • Figure 5: Effects of further training of the proposed method for RetinaNet on DIOR-R: It can be confirmed that though our DBF achieves worse mAP than the baseline when $\rho \in \{10, \infty\}$ during the initial training following the policies in Table \ref{['tab1']}, the prediction accuracy of ours are eventually improved with longer additional training. Besides, it brings steady performance improvement even in long-term training, compared to other methods.