Table of Contents
Fetching ...

LiM-YOLO: Less is More with Pyramid Level Shift and Normalized Auxiliary Branch for Ship Detection in Optical Remote Sensing Imagery

Seon-Hoon Kim, Hyeji Sim, Youeyun Jung, Ok-Chul Jung, Yerin Kim

TL;DR

LiM-YOLO tackles the core problem of detecting small, elongated ships in optical remote sensing by identifying a scale-mismatch between ships and the YOLO detection head. It introduces a Pyramid Level Shift to use P2-P4 (adding a high-resolution P2 head) and prune P5, coupled with a Group Normalized Auxiliary Branch (GN-CBLinear) to stabilize training on micro-batches. Across four diverse datasets (SODA-A, DOTA-v1.5, FAIR1M-v2.0, ShipRSImageNet-V1), LiM-YOLO achieves state-of-the-art accuracy with substantially fewer parameters and GFLOPs, validating the effectiveness of domain-specific architectural alignment. The approach improves small-ship localization, reduces background noise, and offers practical efficiency for onboard maritime surveillance systems.

Abstract

Applying general-purpose object detectors to ship detection in satellite imagery presents significant challenges due to the extreme scale disparity and morphological anisotropy of maritime targets. Standard architectures utilizing stride-32 (P5) layers often fail to resolve narrow vessels, resulting in spatial feature dilution. In this work, we propose LiM-YOLO, a specialized detector designed to resolve these domain-specific conflicts. Based on a statistical analysis of ship scales, we introduce a Pyramid Level Shift Strategy that reconfigures the detection head to P2-P4. This shift ensures compliance with Nyquist sampling criteria for small objects while eliminating the computational redundancy of deep layers. To further enhance training stability on high-resolution inputs, we incorporate a Group Normalized Convolutional Block for Linear Projection (GN-CBLinear), which mitigates gradient volatility in micro-batch settings. Validated on SODA-A, DOTA-v1.5, FAIR1M-v2.0, and ShipRSImageNet-V1, LiM-YOLO demonstrates superior detection accuracy and efficiency compared to state-of-the-art models. The code is available at https://github.com/egshkim/LiM-YOLO.

LiM-YOLO: Less is More with Pyramid Level Shift and Normalized Auxiliary Branch for Ship Detection in Optical Remote Sensing Imagery

TL;DR

LiM-YOLO tackles the core problem of detecting small, elongated ships in optical remote sensing by identifying a scale-mismatch between ships and the YOLO detection head. It introduces a Pyramid Level Shift to use P2-P4 (adding a high-resolution P2 head) and prune P5, coupled with a Group Normalized Auxiliary Branch (GN-CBLinear) to stabilize training on micro-batches. Across four diverse datasets (SODA-A, DOTA-v1.5, FAIR1M-v2.0, ShipRSImageNet-V1), LiM-YOLO achieves state-of-the-art accuracy with substantially fewer parameters and GFLOPs, validating the effectiveness of domain-specific architectural alignment. The approach improves small-ship localization, reduces background noise, and offers practical efficiency for onboard maritime surveillance systems.

Abstract

Applying general-purpose object detectors to ship detection in satellite imagery presents significant challenges due to the extreme scale disparity and morphological anisotropy of maritime targets. Standard architectures utilizing stride-32 (P5) layers often fail to resolve narrow vessels, resulting in spatial feature dilution. In this work, we propose LiM-YOLO, a specialized detector designed to resolve these domain-specific conflicts. Based on a statistical analysis of ship scales, we introduce a Pyramid Level Shift Strategy that reconfigures the detection head to P2-P4. This shift ensures compliance with Nyquist sampling criteria for small objects while eliminating the computational redundancy of deep layers. To further enhance training stability on high-resolution inputs, we incorporate a Group Normalized Convolutional Block for Linear Projection (GN-CBLinear), which mitigates gradient volatility in micro-batch settings. Validated on SODA-A, DOTA-v1.5, FAIR1M-v2.0, and ShipRSImageNet-V1, LiM-YOLO demonstrates superior detection accuracy and efficiency compared to state-of-the-art models. The code is available at https://github.com/egshkim/LiM-YOLO.

Paper Structure

This paper contains 33 sections, 6 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Heatmap of Ship Major Axis Distribution
  • Figure 2: Heatmap of Ship Minor Axis Distribution
  • Figure 3: The overall architecture of the YOLOv9-E baseline. The detection head follows the conventional multi-scale configuration at pyramid levels P3, P4, and P5 (Strides 8, 16, 32).
  • Figure 4: Effective Recptive Field of YOLOv9e when the head level is P2/P3/P4/P5, with approximated diameter of 667.7, 860.5, 1024.7, 1112.2 pixels respectively.
  • Figure 5: The overall architecture of the proposed LiM-YOLO. To address the scale mismatch in satellite imagery, we shift the detection pyramid levels from the conventional P3-P5 to P2-P4. The High-Resolution P2 Head (Stride 4) is introduced to recover fine-grained spatial details of small ships, while the deep P5 layers are pruned to eliminate receptive field redundancy.
  • ...and 3 more figures