Revolutionizing Communication with Deep Learning and XAI for Enhanced Arabic Sign Language Recognition
Mazen Balat, Rewaa Awaad, Ahmed B. Zaky, Salah A. Aly
TL;DR
This paper tackles Arabic Sign Language recognition by combining multiple deep learning architectures (MobileNetV3, ResNet50, EfficientNet-B2) with Explainable AI through Grad-CAM to enhance interpretability. It evaluates on two datasets, ArSL2018 and AASL, addressing class imbalance via undersampling for ArSL2018 and extensive data augmentation, and employs stratified 5-fold cross-validation to ensure robust generalization. EfficientNet-B2 achieves the top performance, reaching $99.48\%$ on ArSL2018 and $98.99\%$ on AASL, while Grad-CAM visualizations provide transparent explanations of model decisions. The results demonstrate strong accuracy and interpretability, with potential impact on healthcare, education, and inclusive communication, and pave the way for broader, multilingual sign-language recognition systems.
Abstract
This study introduces an integrated approach to recognizing Arabic Sign Language (ArSL) using state-of-the-art deep learning models such as MobileNetV3, ResNet50, and EfficientNet-B2. These models are further enhanced by explainable AI (XAI) techniques to boost interpretability. The ArSL2018 and RGB Arabic Alphabets Sign Language (AASL) datasets are employed, with EfficientNet-B2 achieving peak accuracies of 99.48\% and 98.99\%, respectively. Key innovations include sophisticated data augmentation methods to mitigate class imbalance, implementation of stratified 5-fold cross-validation for better generalization, and the use of Grad-CAM for clear model decision transparency. The proposed system not only sets new benchmarks in recognition accuracy but also emphasizes interpretability, making it suitable for applications in healthcare, education, and inclusive communication technologies.
