Table of Contents
Fetching ...

E-TSL: A Continuous Educational Turkish Sign Language Dataset with Baseline Methods

Şükrü Öztürk, Hacer Yalim Keles

TL;DR

This paper presents E-TSL, a continuous Turkish Sign Language dataset drawn from educational videos to address the unique vocabulary challenges of Turkish. It proposes two transformer-based baselines, GNN-T and P2T-T, that operate on pose-grounded graphs derived from MediaPipe landmarks for sign-to-text translation. Across evaluations on E-TSL and the PHOENIX14T benchmark, GNN-T generally achieves higher ROUGE-L and BLEU scores, underscoring the value of graph-based representations for continuous sign translation. The work establishes a foundation for Turkish SL translation in educational contexts and suggests directions for improvement, such as sentence-level segmentation and exploring alternative pose representations.

Abstract

This study introduces the continuous Educational Turkish Sign Language (E-TSL) dataset, collected from online Turkish language lessons for 5th, 6th, and 8th grades. The dataset comprises 1,410 videos totaling nearly 24 hours and includes performances from 11 signers. Turkish, an agglutinative language, poses unique challenges for sign language translation, particularly with a vocabulary where 64% are singleton words and 85% are rare words, appearing less than five times. We developed two baseline models to address these challenges: the Pose to Text Transformer (P2T-T) and the Graph Neural Network based Transformer (GNN-T) models. The GNN-T model achieved 19.13% BLEU-1 score and 3.28% BLEU-4 score, presenting a significant challenge compared to existing benchmarks. The P2T-T model, while demonstrating slightly lower performance in BLEU scores, achieved a higher ROUGE-L score of 22.09%. Additionally, we benchmarked our model using the well-known PHOENIX-Weather 2014T dataset to validate our approach.

E-TSL: A Continuous Educational Turkish Sign Language Dataset with Baseline Methods

TL;DR

This paper presents E-TSL, a continuous Turkish Sign Language dataset drawn from educational videos to address the unique vocabulary challenges of Turkish. It proposes two transformer-based baselines, GNN-T and P2T-T, that operate on pose-grounded graphs derived from MediaPipe landmarks for sign-to-text translation. Across evaluations on E-TSL and the PHOENIX14T benchmark, GNN-T generally achieves higher ROUGE-L and BLEU scores, underscoring the value of graph-based representations for continuous sign translation. The work establishes a foundation for Turkish SL translation in educational contexts and suggests directions for improvement, such as sentence-level segmentation and exploring alternative pose representations.

Abstract

This study introduces the continuous Educational Turkish Sign Language (E-TSL) dataset, collected from online Turkish language lessons for 5th, 6th, and 8th grades. The dataset comprises 1,410 videos totaling nearly 24 hours and includes performances from 11 signers. Turkish, an agglutinative language, poses unique challenges for sign language translation, particularly with a vocabulary where 64% are singleton words and 85% are rare words, appearing less than five times. We developed two baseline models to address these challenges: the Pose to Text Transformer (P2T-T) and the Graph Neural Network based Transformer (GNN-T) models. The GNN-T model achieved 19.13% BLEU-1 score and 3.28% BLEU-4 score, presenting a significant challenge compared to existing benchmarks. The P2T-T model, while demonstrating slightly lower performance in BLEU scores, achieved a higher ROUGE-L score of 22.09%. Additionally, we benchmarked our model using the well-known PHOENIX-Weather 2014T dataset to validate our approach.
Paper Structure (17 sections, 3 figures, 4 tables)

This paper contains 17 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Word Distribution of E-TSL Dataset
  • Figure 2: Sample Images from PHOENIX14T Dataset camgoz2018
  • Figure 3: Architecture of the GNN-T Model