Recognising BSL Fingerspelling in Continuous Signing Sequences

Alyssa Chan; Taein Kwon; Andrew Zisserman

Recognising BSL Fingerspelling in Continuous Signing Sequences

Alyssa Chan, Taein Kwon, Andrew Zisserman

Abstract

Fingerspelling is a critical component of British Sign Language (BSL), used to spell proper names, technical terms, and words that lack established lexical signs. Fingerspelling recognition is challenging due to the rapid pace of signing and common letter omissions by native signers, while existing BSL fingerspelling datasets are either small in scale or temporally and letter-wise inaccurate. In this work, we introduce a new large-scale BSL fingerspelling dataset, FS23K, constructed using an iterative annotation framework. In addition, we propose a fingerspelling recognition model that explicitly accounts for bi-manual interactions and mouthing cues. As a result, with refined annotations, our approach halves the character error rate (CER) compared to the prior state of the art on fingerspelling recognition. These findings demonstrate the effectiveness of our method and highlight its potential to support future research in sign language understanding and scalable, automated annotation pipelines. The project page can be found at https://taeinkwon.com/projects/fs23k/.

Recognising BSL Fingerspelling in Continuous Signing Sequences

Abstract

Paper Structure (41 sections, 3 equations, 15 figures, 6 tables)

This paper contains 41 sections, 3 equations, 15 figures, 6 tables.

Introduction
Related Work
Fingerspelling Datasets
BSL Datasets
Fingerspelling Detection and Recognition
Fingerspelling Recognition Model
Architecture
Hand Features
Lip Features
Network
Loss functions
Training Implementation Details
Letter-level masking
Data augmentation
Dataset
...and 26 more sections

Figures (15)

Figure 1: BSL fingerspelling recognition. Video frames from continuous signing where a fingerspelling temporal interval is detected, and hand and lip features are used to correctly recognize the signed letters.
Figure 2: The BSL alphabet. Unlike many other sign languages, British Sign Language (BSL) employs bi-manual fingerspelling, which poses additional challenges for recognition due to frequent occlusions between the two hands. Note, these examples are for a left-handed signer.
Figure 3: Letter 'p' being signed by three different signers. In (a) the right hand is outstretched, differing from the template fingerspelling. Additionally, in (c) the signer is slightly turned to the left, making the position of the left hand more ambiguous.
Figure 4: Fingerspelling recognition network architecture. The model leverages two complementary feature modalities: lip features extracted using AUTO-AVSR ma2023auto and hand features obtained from HAMER pavlakos_reconstructing_2023. Each modality is first passed through an individual linear projection to align feature dimensions, followed by separate Transformer encoders. The encoded features are then concatenated and further processed by a Transformer encoder. Finally, a two-layer MLP predicts per-frame letter labels, which are also used as inputs to the CTC decoder. The dimensions are shown in the numbers beside the arrows. The 384 dimensional hand features cover the vector dimension for both hands.
Figure 5: Histogram of letter distribution in FS23K. The letters a (16,577) and e (13,754) occur most frequently, whereas q (143) and x (322) appear least often. This imbalance reflects the natural distribution of letters in in-the-wild BBC broadcast data.
...and 10 more figures

Recognising BSL Fingerspelling in Continuous Signing Sequences

Abstract

Recognising BSL Fingerspelling in Continuous Signing Sequences

Authors

Abstract

Table of Contents

Figures (15)