Table of Contents
Fetching ...

YOLO11-4K: An Efficient Architecture for Real-Time Small Object Detection in 4K Panoramic Images

Huma Hafeez, Matthew Garratt, Jo Plested, Sankaran Iyer, Arcot Sowmya

TL;DR

This work tackles real-time small-object detection in 4K panoramic (360° ERP) imagery, a setting where conventional detectors struggle due to distortions and computational demands. It introduces YOLO11-4K, an end-to-end architecture with a GhostConv-based lightweight backbone, a P2 small-object detection head, and a four-scale P2–P5 detection framework, enabling efficient 4K processing. A 6,876-image CVIP360 dataset with detection annotations is created to benchmark high-resolution 360° detection, and the model is evaluated against cross-dataset MRTMD, demonstrating strong small-object performance and substantial speed gains (≈28.3 ms per 4K frame, ~75% latency reduction vs YOLO11). The results establish a practical, scalable approach for real-time panoramic perception with potential applications in autonomous navigation, surveillance, and AR/VR, and provide a publicly available benchmark for future research.

Abstract

The processing of omnidirectional 360-degree images poses significant challenges for object detection due to inherent spatial distortions, wide fields of view, and ultra-high-resolution inputs. Conventional detectors such as YOLO are optimised for standard image sizes (for example, 640x640 pixels) and often struggle with the computational demands of 4K or higher-resolution imagery typical of 360-degree vision. To address these limitations, we introduce YOLO11-4K, an efficient real-time detection framework tailored for 4K panoramic images. The architecture incorporates a novel multi-scale detection head with a P2 layer to improve sensitivity to small objects often missed at coarser scales, and a GhostConv-based backbone to reduce computational complexity without sacrificing representational power. To enable evaluation, we manually annotated the CVIP360 dataset, generating 6,876 frame-level bounding boxes and producing a publicly available, detection-ready benchmark for 4K panoramic scenes. YOLO11-4K achieves 0.95 mAP at 0.50 IoU with 28.3 milliseconds inference per frame, representing a 75 percent latency reduction compared to YOLO11 (112.3 milliseconds), while also improving accuracy (mAP at 0.50 of 0.95 versus 0.908). This balance of efficiency and precision enables robust object detection in expansive 360-degree environments, making the framework suitable for real-world high-resolution panoramic applications. While this work focuses on 4K omnidirectional images, the approach is broadly applicable to high-resolution detection tasks in autonomous navigation, surveillance, and augmented reality.

YOLO11-4K: An Efficient Architecture for Real-Time Small Object Detection in 4K Panoramic Images

TL;DR

This work tackles real-time small-object detection in 4K panoramic (360° ERP) imagery, a setting where conventional detectors struggle due to distortions and computational demands. It introduces YOLO11-4K, an end-to-end architecture with a GhostConv-based lightweight backbone, a P2 small-object detection head, and a four-scale P2–P5 detection framework, enabling efficient 4K processing. A 6,876-image CVIP360 dataset with detection annotations is created to benchmark high-resolution 360° detection, and the model is evaluated against cross-dataset MRTMD, demonstrating strong small-object performance and substantial speed gains (≈28.3 ms per 4K frame, ~75% latency reduction vs YOLO11). The results establish a practical, scalable approach for real-time panoramic perception with potential applications in autonomous navigation, surveillance, and AR/VR, and provide a publicly available benchmark for future research.

Abstract

The processing of omnidirectional 360-degree images poses significant challenges for object detection due to inherent spatial distortions, wide fields of view, and ultra-high-resolution inputs. Conventional detectors such as YOLO are optimised for standard image sizes (for example, 640x640 pixels) and often struggle with the computational demands of 4K or higher-resolution imagery typical of 360-degree vision. To address these limitations, we introduce YOLO11-4K, an efficient real-time detection framework tailored for 4K panoramic images. The architecture incorporates a novel multi-scale detection head with a P2 layer to improve sensitivity to small objects often missed at coarser scales, and a GhostConv-based backbone to reduce computational complexity without sacrificing representational power. To enable evaluation, we manually annotated the CVIP360 dataset, generating 6,876 frame-level bounding boxes and producing a publicly available, detection-ready benchmark for 4K panoramic scenes. YOLO11-4K achieves 0.95 mAP at 0.50 IoU with 28.3 milliseconds inference per frame, representing a 75 percent latency reduction compared to YOLO11 (112.3 milliseconds), while also improving accuracy (mAP at 0.50 of 0.95 versus 0.908). This balance of efficiency and precision enables robust object detection in expansive 360-degree environments, making the framework suitable for real-world high-resolution panoramic applications. While this work focuses on 4K omnidirectional images, the approach is broadly applicable to high-resolution detection tasks in autonomous navigation, surveillance, and augmented reality.

Paper Structure

This paper contains 16 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 2: 4K CVIP360 ERP images with annotated small and distant pedestrians in indoor and outdoor 360$^\circ$ scenes
  • Figure 3: YOLO11-4K detection results on challenging 4K panoramic scenes. Both indoor and outdoor scenarios are shown, highlighting successful detection of small and occluded objects.
  • Figure 4: Quantitative analysis of YOLO11-4K detection results: (a) Distribution of bounding box sizes, (b) Width--Height relationship highlighting extremely small and large objects.