Enhancing YOLOv11n for Reliable Child Detection in Noisy Surveillance Footage
Khanh Linh Tran, Minh Nguyen Dang, Thien Nguyen Trong, Hung Nguyen Quoc, Linh Nguyen Kieu
TL;DR
Detecting children in noisy surveillance is challenged by occlusion, small object size, and poor lighting. The authors propose a deployment-friendly pipeline that fine-tunes the lightweight YOLOv11n on a child-only surveillance dataset, augmented with domain-specific compositing and image degradations, plus SAHI at inference to boost recall for small or partially visible children. On a Roboflow Daycare subset, they report improvements to $mAP\@0.5 = 0.967$ and $mAP\@0.5:0.95 = 0.783$, corresponding to $0.7$ and $2.3$ percentage-point gains over the baseline, while preserving real-time edge deployment. The approach remains practical for low-cost, resource-constrained deployments and lays groundwork for future multi-camera and domain-adaptive enhancements in safety-critical daycare surveillance.
Abstract
This paper presents a practical and lightweight solution for enhancing child detection in low-quality surveillance footage, a critical component in real-world missing child alert and daycare monitoring systems. Building upon the efficient YOLOv11n architecture, we propose a deployment-ready pipeline that improves detection under challenging conditions including occlusion, small object size, low resolution, motion blur, and poor lighting commonly found in existing CCTV infrastructures. Our approach introduces a domain-specific augmentation strategy that synthesizes realistic child placements using spatial perturbations such as partial visibility, truncation, and overlaps, combined with photometric degradations including lighting variation and noise. To improve recall of small and partially occluded instances, we integrate Slicing Aided Hyper Inference (SAHI) at inference time. All components are trained and evaluated on a filtered, child-only subset of the Roboflow Daycare dataset. Compared to the baseline YOLOv11n, our enhanced system achieves a mean Average Precision at 0.5 IoU (mAP@0.5) of 0.967 and a mean Average Precision averaged over IoU thresholds from 0.5 to 0.95 (mAP@0.5:0.95) of 0.783, yielding absolute improvements of 0.7 percent and 2.3 percent, respectively, without architectural changes. Importantly, the entire pipeline maintains compatibility with low-power edge devices and supports real-time performance, making it particularly well suited for low-cost or resource-constrained industrial surveillance deployments. The example augmented dataset and the source code used to generate it are available at: https://github.com/html-ptit/Data-Augmentation-YOLOv11n-child-detection
