Table of Contents
Fetching ...

Enhancing YOLOv11n for Reliable Child Detection in Noisy Surveillance Footage

Khanh Linh Tran, Minh Nguyen Dang, Thien Nguyen Trong, Hung Nguyen Quoc, Linh Nguyen Kieu

TL;DR

Detecting children in noisy surveillance is challenged by occlusion, small object size, and poor lighting. The authors propose a deployment-friendly pipeline that fine-tunes the lightweight YOLOv11n on a child-only surveillance dataset, augmented with domain-specific compositing and image degradations, plus SAHI at inference to boost recall for small or partially visible children. On a Roboflow Daycare subset, they report improvements to $mAP\@0.5 = 0.967$ and $mAP\@0.5:0.95 = 0.783$, corresponding to $0.7$ and $2.3$ percentage-point gains over the baseline, while preserving real-time edge deployment. The approach remains practical for low-cost, resource-constrained deployments and lays groundwork for future multi-camera and domain-adaptive enhancements in safety-critical daycare surveillance.

Abstract

This paper presents a practical and lightweight solution for enhancing child detection in low-quality surveillance footage, a critical component in real-world missing child alert and daycare monitoring systems. Building upon the efficient YOLOv11n architecture, we propose a deployment-ready pipeline that improves detection under challenging conditions including occlusion, small object size, low resolution, motion blur, and poor lighting commonly found in existing CCTV infrastructures. Our approach introduces a domain-specific augmentation strategy that synthesizes realistic child placements using spatial perturbations such as partial visibility, truncation, and overlaps, combined with photometric degradations including lighting variation and noise. To improve recall of small and partially occluded instances, we integrate Slicing Aided Hyper Inference (SAHI) at inference time. All components are trained and evaluated on a filtered, child-only subset of the Roboflow Daycare dataset. Compared to the baseline YOLOv11n, our enhanced system achieves a mean Average Precision at 0.5 IoU (mAP@0.5) of 0.967 and a mean Average Precision averaged over IoU thresholds from 0.5 to 0.95 (mAP@0.5:0.95) of 0.783, yielding absolute improvements of 0.7 percent and 2.3 percent, respectively, without architectural changes. Importantly, the entire pipeline maintains compatibility with low-power edge devices and supports real-time performance, making it particularly well suited for low-cost or resource-constrained industrial surveillance deployments. The example augmented dataset and the source code used to generate it are available at: https://github.com/html-ptit/Data-Augmentation-YOLOv11n-child-detection

Enhancing YOLOv11n for Reliable Child Detection in Noisy Surveillance Footage

TL;DR

Detecting children in noisy surveillance is challenged by occlusion, small object size, and poor lighting. The authors propose a deployment-friendly pipeline that fine-tunes the lightweight YOLOv11n on a child-only surveillance dataset, augmented with domain-specific compositing and image degradations, plus SAHI at inference to boost recall for small or partially visible children. On a Roboflow Daycare subset, they report improvements to and , corresponding to and percentage-point gains over the baseline, while preserving real-time edge deployment. The approach remains practical for low-cost, resource-constrained deployments and lays groundwork for future multi-camera and domain-adaptive enhancements in safety-critical daycare surveillance.

Abstract

This paper presents a practical and lightweight solution for enhancing child detection in low-quality surveillance footage, a critical component in real-world missing child alert and daycare monitoring systems. Building upon the efficient YOLOv11n architecture, we propose a deployment-ready pipeline that improves detection under challenging conditions including occlusion, small object size, low resolution, motion blur, and poor lighting commonly found in existing CCTV infrastructures. Our approach introduces a domain-specific augmentation strategy that synthesizes realistic child placements using spatial perturbations such as partial visibility, truncation, and overlaps, combined with photometric degradations including lighting variation and noise. To improve recall of small and partially occluded instances, we integrate Slicing Aided Hyper Inference (SAHI) at inference time. All components are trained and evaluated on a filtered, child-only subset of the Roboflow Daycare dataset. Compared to the baseline YOLOv11n, our enhanced system achieves a mean Average Precision at 0.5 IoU (mAP@0.5) of 0.967 and a mean Average Precision averaged over IoU thresholds from 0.5 to 0.95 (mAP@0.5:0.95) of 0.783, yielding absolute improvements of 0.7 percent and 2.3 percent, respectively, without architectural changes. Importantly, the entire pipeline maintains compatibility with low-power edge devices and supports real-time performance, making it particularly well suited for low-cost or resource-constrained industrial surveillance deployments. The example augmented dataset and the source code used to generate it are available at: https://github.com/html-ptit/Data-Augmentation-YOLOv11n-child-detection
Paper Structure (19 sections, 2 equations, 3 figures, 4 tables)

This paper contains 19 sections, 2 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 2: Examples of each effect used in Image-Level Degradations
  • Figure 3: Image-level degradation examples.
  • Figure 4: Qualitative detection results from fine-tuned YOLOv11n models.