Table of Contents
Fetching ...

New keypoint-based approach for recognising British Sign Language (BSL) from sequences

Oishi Deb, KR Prajwal, Andrew Zisserman

TL;DR

This paper addresses recognizing British Sign Language words in continuous signing using a keypoint-based approach. It introduces a Transformer model that processes sequences of 2D keypoints extracted from face, hands, and pose with Mediapipe, achieving substantially lower computational cost than RGB-based methods while delivering a 60% top-5 accuracy on unseen data. The work demonstrates that keypoint representations can enable real-time, signer-independent BSL recognition and lays groundwork for future multimodal extensions and 3D pose estimation. The findings suggest significant practical potential for efficient, accessible sign-language recognition systems, with room for accuracy gains through modality fusion and advanced pose estimation techniques.

Abstract

In this paper, we present a novel keypoint-based classification model designed to recognise British Sign Language (BSL) words within continuous signing sequences. Our model's performance is assessed using the BOBSL dataset, revealing that the keypoint-based approach surpasses its RGB-based counterpart in computational efficiency and memory usage. Furthermore, it offers expedited training times and demands fewer computational resources. To the best of our knowledge, this is the inaugural application of a keypoint-based model for BSL word classification, rendering direct comparisons with existing works unavailable.

New keypoint-based approach for recognising British Sign Language (BSL) from sequences

TL;DR

This paper addresses recognizing British Sign Language words in continuous signing using a keypoint-based approach. It introduces a Transformer model that processes sequences of 2D keypoints extracted from face, hands, and pose with Mediapipe, achieving substantially lower computational cost than RGB-based methods while delivering a 60% top-5 accuracy on unseen data. The work demonstrates that keypoint representations can enable real-time, signer-independent BSL recognition and lays groundwork for future multimodal extensions and 3D pose estimation. The findings suggest significant practical potential for efficient, accessible sign-language recognition systems, with room for accuracy gains through modality fusion and advanced pose estimation techniques.

Abstract

In this paper, we present a novel keypoint-based classification model designed to recognise British Sign Language (BSL) words within continuous signing sequences. Our model's performance is assessed using the BOBSL dataset, revealing that the keypoint-based approach surpasses its RGB-based counterpart in computational efficiency and memory usage. Furthermore, it offers expedited training times and demands fewer computational resources. To the best of our knowledge, this is the inaugural application of a keypoint-based model for BSL word classification, rendering direct comparisons with existing works unavailable.

Paper Structure

This paper contains 17 sections, 6 figures.

Figures (6)

  • Figure 1: Transformer Architecture for Frame-wise and Trajectory-wise attention model.
  • Figure 2: Shift augmentation is exemplified on two distinct frames, including the original keypoint representation prior to augmentation. For visualization clarity, the shift is depicted as 15 coordinates on both the x and y-axis. However, within the code, shifts range between -2 and 2. It's pertinent to note that while RGB images are overlaid for visualization, they are not utilized in model training.
  • Figure 3: Scale augmentation is illustrated using two distinct frames, with the original keypoint representation also displayed before augmentation. For visualization, the scale is depicted as 75 percent. However, in the actual code, scaling varies between 90 to 110 percent.
  • Figure 4: Horizontally Flipped augmentation example on two different frames, it also shows the original keypoint representation before the augmentation.
  • Figure 5: Rotated augmentation example on two different frames, it also shows the original keypoint representation before the augmentation.
  • ...and 1 more figures