Table of Contents
Fetching ...

Novel Human Machine Interface via Robust Hand Gesture Recognition System using Channel Pruned YOLOv5s Model

Abir Sen, Tapas Kumar Mishra, Ratnakar Dash

TL;DR

The study tackles real-time hand gesture recognition for interactive systems in cluttered, variable lighting environments by introducing a lightweight channel-pruned YOLOv5s pipeline that jointly detects and classifies gestures. The method follows five stages: data acquisition, preprocessing, detection/classification with YOLOv5s, channel pruning, and deployment for real-time HCI, including a fine-tuning step to recover performance after pruning. Experimental results on the private NITR-HGR and public ASL datasets show substantial reductions in parameters and GFLOPs with pruning, modest decreases in mAP as pruning increases, and sustained high-frame-rate operation (>60 fps) suitable for ms-scale responsiveness. The pruned model is demonstrated in a novel HCI that controls VLC and Spotify in real time, with robustness enhancements achieved via frame-rate throttling to prevent command misfires, highlighting practical applicability and future potential for multimodal, dynamic gesture interfaces.

Abstract

Hand gesture recognition (HGR) is a vital component in enhancing the human-computer interaction experience, particularly in multimedia applications, such as virtual reality, gaming, smart home automation systems, etc. Users can control and navigate through these applications seamlessly by accurately detecting and recognizing gestures. However, in a real-time scenario, the performance of the gesture recognition system is sometimes affected due to the presence of complex background, low-light illumination, occlusion problems, etc. Another issue is building a fast and robust gesture-controlled human-computer interface (HCI) in the real-time scenario. The overall objective of this paper is to develop an efficient hand gesture detection and classification model using a channel-pruned YOLOv5-small model and utilize the model to build a gesture-controlled HCI with a quick response time (in ms) and higher detection speed (in fps). First, the YOLOv5s model is chosen for the gesture detection task. Next, the model is simplified by using a channel-pruned algorithm. After that, the pruned model is further fine-tuned to ensure detection efficiency. We have compared our suggested scheme with other state-of-the-art works, and it is observed that our model has shown superior results in terms of mAP (mean average precision), precision (\%), recall (\%), and F1-score (\%), fast inference time (in ms), and detection speed (in fps). Our proposed method paves the way for deploying a pruned YOLOv5s model for a real-time gesture-command-based HCI to control some applications, such as the VLC media player, Spotify player, etc., using correctly classified gesture commands in real-time scenarios. The average detection speed of our proposed system has reached more than 60 frames per second (fps) in real-time, which meets the perfect requirement in real-time application control.

Novel Human Machine Interface via Robust Hand Gesture Recognition System using Channel Pruned YOLOv5s Model

TL;DR

The study tackles real-time hand gesture recognition for interactive systems in cluttered, variable lighting environments by introducing a lightweight channel-pruned YOLOv5s pipeline that jointly detects and classifies gestures. The method follows five stages: data acquisition, preprocessing, detection/classification with YOLOv5s, channel pruning, and deployment for real-time HCI, including a fine-tuning step to recover performance after pruning. Experimental results on the private NITR-HGR and public ASL datasets show substantial reductions in parameters and GFLOPs with pruning, modest decreases in mAP as pruning increases, and sustained high-frame-rate operation (>60 fps) suitable for ms-scale responsiveness. The pruned model is demonstrated in a novel HCI that controls VLC and Spotify in real time, with robustness enhancements achieved via frame-rate throttling to prevent command misfires, highlighting practical applicability and future potential for multimodal, dynamic gesture interfaces.

Abstract

Hand gesture recognition (HGR) is a vital component in enhancing the human-computer interaction experience, particularly in multimedia applications, such as virtual reality, gaming, smart home automation systems, etc. Users can control and navigate through these applications seamlessly by accurately detecting and recognizing gestures. However, in a real-time scenario, the performance of the gesture recognition system is sometimes affected due to the presence of complex background, low-light illumination, occlusion problems, etc. Another issue is building a fast and robust gesture-controlled human-computer interface (HCI) in the real-time scenario. The overall objective of this paper is to develop an efficient hand gesture detection and classification model using a channel-pruned YOLOv5-small model and utilize the model to build a gesture-controlled HCI with a quick response time (in ms) and higher detection speed (in fps). First, the YOLOv5s model is chosen for the gesture detection task. Next, the model is simplified by using a channel-pruned algorithm. After that, the pruned model is further fine-tuned to ensure detection efficiency. We have compared our suggested scheme with other state-of-the-art works, and it is observed that our model has shown superior results in terms of mAP (mean average precision), precision (\%), recall (\%), and F1-score (\%), fast inference time (in ms), and detection speed (in fps). Our proposed method paves the way for deploying a pruned YOLOv5s model for a real-time gesture-command-based HCI to control some applications, such as the VLC media player, Spotify player, etc., using correctly classified gesture commands in real-time scenarios. The average detection speed of our proposed system has reached more than 60 frames per second (fps) in real-time, which meets the perfect requirement in real-time application control.
Paper Structure (20 sections, 7 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 20 sections, 7 equations, 6 figures, 4 tables, 2 algorithms.

Figures (6)

  • Figure 1: Block diagram of our suggested scheme.
  • Figure 2: Pipeline of channel-pruned YOLOv5s model.
  • Figure 3: Layer index-wise changes of channels, before and after channel pruning after using pruning rate 15%.
  • Figure 4: Illustration of two correctly classified gesture samples using the fine-tuned channel pruned YOLOv5s model, (A) normal person, (B) physically impaired person (with bend hand).
  • Figure 5: Illustration of the gesture-controlled VLC player in real-time.
  • ...and 1 more figures