Table of Contents
Fetching ...

Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets

Ahmet Alp Kindiroglu, Ozgur Kara, Ogulcan Ozdemir, Lale Akarun

TL;DR

This work tackles cross-dataset isolated sign-language recognition for under-resourced languages by establishing a public cross-dataset transfer-learning benchmark using two Turkish datasets (BSign22k and AUTSL) with 57 shared signs. It employs a coordinate-based SL-GCN pipeline fed by OpenPose joints and evaluates five supervised transfer methods under closed-set and partial-set transfer scenarios. The study shows that specialized supervised transfer approaches (MCC, JAN, DSBN, DANN) can surpass finetuning, especially when target data is scarce or when shared labels exist, and that partial-set transfer benefits from larger source vocabularies. Overall, the paper provides a replicable benchmark and demonstrates meaningful gains for cross-dataset SLR, aiding research on under-resourced sign languages and related video classification tasks.

Abstract

Sign language recognition (SLR) has recently achieved a breakthrough in performance thanks to deep neural networks trained on large annotated sign datasets. Of the many different sign languages, these annotated datasets are only available for a select few. Since acquiring gloss-level labels on sign language videos is difficult, learning by transferring knowledge from existing annotated sources is useful for recognition in under-resourced sign languages. This study provides a publicly available cross-dataset transfer learning benchmark from two existing public Turkish SLR datasets. We use a temporal graph convolution-based sign language recognition approach to evaluate five supervised transfer learning approaches and experiment with closed-set and partial-set cross-dataset transfer learning. Experiments demonstrate that improvement over finetuning based transfer learning is possible with specialized supervised transfer learning methods.

Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets

TL;DR

This work tackles cross-dataset isolated sign-language recognition for under-resourced languages by establishing a public cross-dataset transfer-learning benchmark using two Turkish datasets (BSign22k and AUTSL) with 57 shared signs. It employs a coordinate-based SL-GCN pipeline fed by OpenPose joints and evaluates five supervised transfer methods under closed-set and partial-set transfer scenarios. The study shows that specialized supervised transfer approaches (MCC, JAN, DSBN, DANN) can surpass finetuning, especially when target data is scarce or when shared labels exist, and that partial-set transfer benefits from larger source vocabularies. Overall, the paper provides a replicable benchmark and demonstrates meaningful gains for cross-dataset SLR, aiding research on under-resourced sign languages and related video classification tasks.

Abstract

Sign language recognition (SLR) has recently achieved a breakthrough in performance thanks to deep neural networks trained on large annotated sign datasets. Of the many different sign languages, these annotated datasets are only available for a select few. Since acquiring gloss-level labels on sign language videos is difficult, learning by transferring knowledge from existing annotated sources is useful for recognition in under-resourced sign languages. This study provides a publicly available cross-dataset transfer learning benchmark from two existing public Turkish SLR datasets. We use a temporal graph convolution-based sign language recognition approach to evaluate five supervised transfer learning approaches and experiment with closed-set and partial-set cross-dataset transfer learning. Experiments demonstrate that improvement over finetuning based transfer learning is possible with specialized supervised transfer learning methods.
Paper Structure (12 sections, 6 figures, 5 tables)

This paper contains 12 sections, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Representative Image of the proposed SL-GCN architecuture. The proposed baseline method takes as input coordinates and outputs classification results for each video.
  • Figure 2: Using gradient reversal on domain prediction loss, sign language recognition accuracy across source and target domains are improved.
  • Figure 3: Domain specific batch normalization layers are learned to perform batch specific normalization.
  • Figure 4: JMMD loss is usded to learn domain specific weights for the final layers of the network.
  • Figure 5: Calculation of minimum class confusion loss for supervised SLR. Class confusion matrix is calculated by multiplying the logit matrix by its transpose. Confident predictions increase weight of values on matrices diagonal.
  • ...and 1 more figures