Table of Contents
Fetching ...

Interpretable Underwater Diver Gesture Recognition

Sudeep Mangalvedhekar, Shreyas Nahar, Sudarshan Maskare, Kaushal Mahajan, Anant Bagade

TL;DR

The paper tackles underwater diver–AUV communication by developing a deep learning-based gesture recognition system trained on the CADDY dataset, achieving state-of-the-art accuracy of 98.01% with a ResNet18 backbone and real-time video processing via a rolling-average frame classifier. It emphasizes interpretability by applying Integrated Gradients and Occlusion Sensitivity to visualize model decisions, enhancing trust in autonomous underwater systems. Key contributions include a rigorous evaluation of multiple backbones, transfer-learning strategies, and a practical video pipeline, demonstrating strong performance across diverse underwater environments. The work advances practical, real-time, and interpretable underwater gesture recognition, enabling safer and more reliable human–robot collaboration in challenging aquatic settings.

Abstract

In recent years, usage and applications of Autonomous Underwater Vehicles has grown rapidly. Interaction of divers with the AUVs remains an integral part of the usage of AUVs for various applications and makes building robust and efficient underwater gesture recognition systems extremely important. In this paper, we propose an Underwater Gesture Recognition system trained on the Cognitive Autonomous Diving Buddy Underwater gesture dataset using deep learning that achieves 98.01\% accuracy on the dataset, which to the best of our knowledge is the best performance achieved on this dataset at the time of writing this paper. We also improve the Gesture Recognition System Interpretability by using XAI techniques to visualize the model's predictions.

Interpretable Underwater Diver Gesture Recognition

TL;DR

The paper tackles underwater diver–AUV communication by developing a deep learning-based gesture recognition system trained on the CADDY dataset, achieving state-of-the-art accuracy of 98.01% with a ResNet18 backbone and real-time video processing via a rolling-average frame classifier. It emphasizes interpretability by applying Integrated Gradients and Occlusion Sensitivity to visualize model decisions, enhancing trust in autonomous underwater systems. Key contributions include a rigorous evaluation of multiple backbones, transfer-learning strategies, and a practical video pipeline, demonstrating strong performance across diverse underwater environments. The work advances practical, real-time, and interpretable underwater gesture recognition, enabling safer and more reliable human–robot collaboration in challenging aquatic settings.

Abstract

In recent years, usage and applications of Autonomous Underwater Vehicles has grown rapidly. Interaction of divers with the AUVs remains an integral part of the usage of AUVs for various applications and makes building robust and efficient underwater gesture recognition systems extremely important. In this paper, we propose an Underwater Gesture Recognition system trained on the Cognitive Autonomous Diving Buddy Underwater gesture dataset using deep learning that achieves 98.01\% accuracy on the dataset, which to the best of our knowledge is the best performance achieved on this dataset at the time of writing this paper. We also improve the Gesture Recognition System Interpretability by using XAI techniques to visualize the model's predictions.
Paper Structure (17 sections, 9 figures, 4 tables)

This paper contains 17 sections, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Gesture Classes in the CADDY dataset
  • Figure 2: Class distribution in CADDY dataset
  • Figure 3: Class distribution per scenario in CADDY dataset
  • Figure 4: Mobile Net Architecture
  • Figure 5: ResNet block
  • ...and 4 more figures