HitoMi-Cam: A Shape-Agnostic Person Detection Method Using the Spectral Characteristics of Clothing
Shuji Ono
TL;DR
This work tackles the limitations of CNN-based person detectors that rely on shape cues and training data biases by proposing HitoMi-Cam, a shape-agnostic detector that uses spectral clothing signatures from four narrow bands. Implemented on a low-cost edge device (Raspberry Pi 5) with a 4-band multispectral camera, it achieves real-time processing (23.2 fps) and strong presence-detection performance in challenging scenarios, notably a simulated SAR setting where CNNs underperform (AP up to 93.5% vs 53.8% for the best CNN). The system employs offline training to produce a lightweight MLP, followed by pixel-wise classification to generate a clothing map and bounding boxes via post-processing, outputting a 1.0 confidence when clothing is detected. Overall, HitoMi-Cam complements traditional detectors by robustly detecting clothing materials in unpredictable postures and environments, offering practical value for disaster rescue and edge-enabled surveillance where shape-based methods struggle, while highlighting the need for integration with CNNs and further robustness enhancements.
Abstract
While convolutional neural network (CNN)-based object detection is widely used, it exhibits a shape dependency that degrades performance for postures not included in the training data. Building upon our previous simulation study published in this journal, this study implements and evaluates the spectral-based approach on physical hardware to address this limitation. Specifically, this paper introduces HitoMi-Cam, a lightweight and shape-agnostic person detection method that uses the spectral reflectance properties of clothing. The author implemented the system on a resource-constrained edge device without a GPU to assess its practical viability. The results indicate that a processing speed of 23.2 frames per second (fps) (253x190 pixels) is achievable, suggesting that the method can be used for real-time applications. In a simulated search and rescue scenario where the performance of CNNs declines, HitoMi-Cam achieved an average precision (AP) of 93.5%, surpassing that of the compared CNN models (best AP of 53.8%). Throughout all evaluation scenarios, the occurrence of false positives remained minimal. This study positions the HitoMi-Cam method not as a replacement for CNN-based detectors but as a complementary tool under specific conditions. The results indicate that spectral-based person detection can be a viable option for real-time operation on edge devices in real-world environments where shapes are unpredictable, such as disaster rescue.
