Table of Contents
Fetching ...

PRNet: Original Information Is All You Have

PeiHuang Zheng, Yunlong Zhao, Zheng Cui, Yang Li

TL;DR

PRNet tackles information loss in small-object detection for aerial imagery by preserving shallow spatial features throughout the network. It introduces the Progressive Refinement Neck (PRN) to iteratively refine high-resolution features via backbone feature reuse and progressive fusion, and Enhanced SliceSamp (ESSamp) to mitigate detail degradation during downsampling using PixelUnShuffle and augmented depthwise convolution with a depth multiplier. Across VisDrone, AI-TOD, and UAVDT, PRNet achieves superior accuracy under realistic computational budgets and shows robust gains in ablations, demonstrating both improved detail preservation and effective multi-scale fusion. The framework offers a practical, real-time solution for precise aerial object detection with strong generalization across detectors and model scales.

Abstract

Small object detection in aerial images suffers from severe information degradation during feature extraction due to limited pixel representations, where shallow spatial details fail to align effectively with semantic information, leading to frequent misses and false positives. Existing FPN-based methods attempt to mitigate these losses through post-processing enhancements, but the reconstructed details often deviate from the original image information, impeding their fusion with semantic content. To address this limitation, we propose PRNet, a real-time detection framework that prioritizes the preservation and efficient utilization of primitive shallow spatial features to enhance small object representations. PRNet achieves this via two modules:the Progressive Refinement Neck (PRN) for spatial-semantic alignment through backbone reuse and iterative refinement, and the Enhanced SliceSamp (ESSamp) for preserving shallow information during downsampling via optimized rearrangement and convolution. Extensive experiments on the VisDrone, AI-TOD, and UAVDT datasets demonstrate that PRNet outperforms state-of-the-art methods under comparable computational constraints, achieving superior accuracy-efficiency trade-offs.

PRNet: Original Information Is All You Have

TL;DR

PRNet tackles information loss in small-object detection for aerial imagery by preserving shallow spatial features throughout the network. It introduces the Progressive Refinement Neck (PRN) to iteratively refine high-resolution features via backbone feature reuse and progressive fusion, and Enhanced SliceSamp (ESSamp) to mitigate detail degradation during downsampling using PixelUnShuffle and augmented depthwise convolution with a depth multiplier. Across VisDrone, AI-TOD, and UAVDT, PRNet achieves superior accuracy under realistic computational budgets and shows robust gains in ablations, demonstrating both improved detail preservation and effective multi-scale fusion. The framework offers a practical, real-time solution for precise aerial object detection with strong generalization across detectors and model scales.

Abstract

Small object detection in aerial images suffers from severe information degradation during feature extraction due to limited pixel representations, where shallow spatial details fail to align effectively with semantic information, leading to frequent misses and false positives. Existing FPN-based methods attempt to mitigate these losses through post-processing enhancements, but the reconstructed details often deviate from the original image information, impeding their fusion with semantic content. To address this limitation, we propose PRNet, a real-time detection framework that prioritizes the preservation and efficient utilization of primitive shallow spatial features to enhance small object representations. PRNet achieves this via two modules:the Progressive Refinement Neck (PRN) for spatial-semantic alignment through backbone reuse and iterative refinement, and the Enhanced SliceSamp (ESSamp) for preserving shallow information during downsampling via optimized rearrangement and convolution. Extensive experiments on the VisDrone, AI-TOD, and UAVDT datasets demonstrate that PRNet outperforms state-of-the-art methods under comparable computational constraints, achieving superior accuracy-efficiency trade-offs.

Paper Structure

This paper contains 15 sections, 8 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Comparative Analysis of Resolution Degradation on Object Visibility Across Datasets. Comparison of object visibility degradation across MS-COCO, VisDrone, and AI-TOD at original, 160×160, and 80×80 resolutions (top to bottom). Small objects exhibit greater impact from losses in edges, textures, and shapes during degradation.
  • Figure 2: Architecture of Progressive Refinement Network. Using YOLO11 as the baseline model, we replace PAN-FPN with our proposed PRN and replace traditional stride convolution downsampling with the proposed ESSamp in the first two layers of the network. The bottom left shows comparisons of feature APs at different stages, demonstrating that the feature quality improves as the number of stages increases.
  • Figure 3: Comparison of PRN and Traditional FPN Architectures. PRN enables backbone feature reuse (orange lines) and progressive fusion (blue lines) for iterative high-resolution feature refinement.
  • Figure 4: ESSamp Module Structure. Utilizes PixelUnShuffle for efficient spatial rearrangement and augmented depthwise convolution (depth multiplier d=2) to enhance feature expressiveness, preserving fine-grained details for small object detection.
  • Figure 5: Visualization of the detection results and heatmaps on VisDrone. The highlighted areas represent the regions that the network is focusing on.