Object Detection Approaches to Identifying Hand Images with High Forensic Values
Thanh Thi Nguyen, Campbell Wilson, Imad Khan, Janis Dalins
TL;DR
The paper tackles identifying hand images with high forensic value by evaluating YOLOv8 and Vision Transformer detectors across multiple datasets, including a newly semi-automatically labeled 11k hands set. It demonstrates that YOLOv8 variants generally outperform DETR/DETA models and that training on a diverse, combined dataset yields the strongest generalization, enabling rapid extraction of high-forensic-value frames from images and videos. The work provides bounding-box labels for the 11k hands dataset, compares model performance across EgoHands, Open Images, and combined data, and presents a practical pipeline for forensic analysts to prioritize frames for review while addressing ethical considerations. Overall, the approach offers a scalable, data-driven tool to streamline forensic image analysis and reduce expert workload, with plan for future handling of occlusion, lighting, and privacy concerns.
Abstract
Forensic science plays a crucial role in legal investigations, and the use of advanced technologies, such as object detection based on machine learning methods, can enhance the efficiency and accuracy of forensic analysis. Human hands are unique and can leave distinct patterns, marks, or prints that can be utilized for forensic examinations. This paper compares various machine learning approaches to hand detection and presents the application results of employing the best-performing model to identify images of significant importance in forensic contexts. We fine-tune YOLOv8 and vision transformer-based object detection models on four hand image datasets, including the 11k hands dataset with our own bounding boxes annotated by a semi-automatic approach. Two YOLOv8 variants, i.e., YOLOv8 nano (YOLOv8n) and YOLOv8 extra-large (YOLOv8x), and two vision transformer variants, i.e., DEtection TRansformer (DETR) and Detection Transformers with Assignment (DETA), are employed for the experiments. Experimental results demonstrate that the YOLOv8 models outperform DETR and DETA on all datasets. The experiments also show that YOLOv8 approaches result in superior performance compared with existing hand detection methods, which were based on YOLOv3 and YOLOv4 models. Applications of our fine-tuned YOLOv8 models for identifying hand images (or frames in a video) with high forensic values produce excellent results, significantly reducing the time required by forensic experts. This implies that our approaches can be implemented effectively for real-world applications in forensics or related fields.
