Table of Contents
Fetching ...

YOLO-SPCI: Enhancing Remote Sensing Object Detection via Selective-Perspective-Class Integration

Xinyuan Wang, Lian Peng, Xiangcheng Li, Yilin He, KinTak U

TL;DR

This work tackles remote sensing object detection challenges posed by extreme scale variation and dense, cluttered scenes by embedding a lightweight Selective-Perspective-Class Integration (SPCI) module into the YOLOv8 backbone. SPCI unifies three attention-driven components—Selective Stream Gate (SSG), Perspective Fusion Module (PFM), and Class Discrimination Module (CDM)—to improve global context, multi-scale fusion, and inter-class separability. Two SPCI blocks are inserted at P3 and P5 to enhance both high-resolution localization and deep semantic features, while keeping neck and head intact. On NWPU VHR-10 and DIOR benchmarks, YOLO-SPCI achieves state-of-the-art accuracy with strong efficiency, demonstrating the practical value of multi-dimensional, lightweight attention for remote sensing detection.

Abstract

Object detection in remote sensing imagery remains a challenging task due to extreme scale variation, dense object distributions, and cluttered backgrounds. While recent detectors such as YOLOv8 have shown promising results, their backbone architectures lack explicit mechanisms to guide multi-scale feature refinement, limiting performance on high-resolution aerial data. In this work, we propose YOLO-SPCI, an attention-enhanced detection framework that introduces a lightweight Selective-Perspective-Class Integration (SPCI) module to improve feature representation. The SPCI module integrates three components: a Selective Stream Gate (SSG) for adaptive regulation of global feature flow, a Perspective Fusion Module (PFM) for context-aware multi-scale integration, and a Class Discrimination Module (CDM) to enhance inter-class separability. We embed two SPCI blocks into the P3 and P5 stages of the YOLOv8 backbone, enabling effective refinement while preserving compatibility with the original neck and head. Experiments on the NWPU VHR-10 dataset demonstrate that YOLO-SPCI achieves superior performance compared to state-of-the-art detectors.

YOLO-SPCI: Enhancing Remote Sensing Object Detection via Selective-Perspective-Class Integration

TL;DR

This work tackles remote sensing object detection challenges posed by extreme scale variation and dense, cluttered scenes by embedding a lightweight Selective-Perspective-Class Integration (SPCI) module into the YOLOv8 backbone. SPCI unifies three attention-driven components—Selective Stream Gate (SSG), Perspective Fusion Module (PFM), and Class Discrimination Module (CDM)—to improve global context, multi-scale fusion, and inter-class separability. Two SPCI blocks are inserted at P3 and P5 to enhance both high-resolution localization and deep semantic features, while keeping neck and head intact. On NWPU VHR-10 and DIOR benchmarks, YOLO-SPCI achieves state-of-the-art accuracy with strong efficiency, demonstrating the practical value of multi-dimensional, lightweight attention for remote sensing detection.

Abstract

Object detection in remote sensing imagery remains a challenging task due to extreme scale variation, dense object distributions, and cluttered backgrounds. While recent detectors such as YOLOv8 have shown promising results, their backbone architectures lack explicit mechanisms to guide multi-scale feature refinement, limiting performance on high-resolution aerial data. In this work, we propose YOLO-SPCI, an attention-enhanced detection framework that introduces a lightweight Selective-Perspective-Class Integration (SPCI) module to improve feature representation. The SPCI module integrates three components: a Selective Stream Gate (SSG) for adaptive regulation of global feature flow, a Perspective Fusion Module (PFM) for context-aware multi-scale integration, and a Class Discrimination Module (CDM) to enhance inter-class separability. We embed two SPCI blocks into the P3 and P5 stages of the YOLOv8 backbone, enabling effective refinement while preserving compatibility with the original neck and head. Experiments on the NWPU VHR-10 dataset demonstrate that YOLO-SPCI achieves superior performance compared to state-of-the-art detectors.

Paper Structure

This paper contains 17 sections, 8 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: The proposed selective-perspective-class integration (SPCI) module integrates the selective stream gate (SSG), perspective fusion module (PFM), and class discrimination module (CDM), and is embedded into the YOLOv8 backbone to enhance feature representation for object detection.
  • Figure 2: Quantitative comparison of mAP50 and mAP50-95 across all ablation variants (B1--B8).
  • Figure 3: Comparison of detection results between the baseline model and the proposed SPCI module with ablation studies on the NWPU VHR-10 dataset.