Table of Contents
Fetching ...

RoCoISLR: A Romanian Corpus for Isolated Sign Language Recognition

Cătălin-Alexandru Rîpanu, Andrei-Theodor Hotnog, Giulia-Stefania Imbrea, Dumitru-Clementin Cercel

TL;DR

This work addresses the lack of a standardized Romanian Isolated Sign Language dataset by introducing RoCoISLR, a large, multi-source corpus with gloss standardization. It conducts a controlled benchmark of seven state-of-the-art video recognition models, comparing to WLASL2000 and showing transformer-based architectures, particularly Swin Transformer, achieving Top-1 34.1% on RoCoISLR. The results underscore significant long-tail and cross-domain generalization challenges in low-resource sign languages. Overall, RoCoISLR provides a reproducible foundation for RoISLR research and broadens evaluation beyond resource-rich languages.

Abstract

Automatic sign language recognition plays a crucial role in bridging the communication gap between deaf communities and hearing individuals; however, most available datasets focus on American Sign Language. For Romanian Isolated Sign Language Recognition (RoISLR), no large-scale, standardized dataset exists, which limits research progress. In this work, we introduce a new corpus for RoISLR, named RoCoISLR, comprising over 9,000 video samples that span nearly 6,000 standardized glosses from multiple sources. We establish benchmark results by evaluating seven state-of-the-art video recognition models-I3D, SlowFast, Swin Transformer, TimeSformer, Uniformer, VideoMAE, and PoseConv3D-under consistent experimental setups, and compare their performance with that of the widely used WLASL2000 corpus. According to the results, transformer-based architectures outperform convolutional baselines; Swin Transformer achieved a Top-1 accuracy of 34.1%. Our benchmarks highlight the challenges associated with long-tail class distributions in low-resource sign languages, and RoCoISLR provides the initial foundation for systematic RoISLR research.

RoCoISLR: A Romanian Corpus for Isolated Sign Language Recognition

TL;DR

This work addresses the lack of a standardized Romanian Isolated Sign Language dataset by introducing RoCoISLR, a large, multi-source corpus with gloss standardization. It conducts a controlled benchmark of seven state-of-the-art video recognition models, comparing to WLASL2000 and showing transformer-based architectures, particularly Swin Transformer, achieving Top-1 34.1% on RoCoISLR. The results underscore significant long-tail and cross-domain generalization challenges in low-resource sign languages. Overall, RoCoISLR provides a reproducible foundation for RoISLR research and broadens evaluation beyond resource-rich languages.

Abstract

Automatic sign language recognition plays a crucial role in bridging the communication gap between deaf communities and hearing individuals; however, most available datasets focus on American Sign Language. For Romanian Isolated Sign Language Recognition (RoISLR), no large-scale, standardized dataset exists, which limits research progress. In this work, we introduce a new corpus for RoISLR, named RoCoISLR, comprising over 9,000 video samples that span nearly 6,000 standardized glosses from multiple sources. We establish benchmark results by evaluating seven state-of-the-art video recognition models-I3D, SlowFast, Swin Transformer, TimeSformer, Uniformer, VideoMAE, and PoseConv3D-under consistent experimental setups, and compare their performance with that of the widely used WLASL2000 corpus. According to the results, transformer-based architectures outperform convolutional baselines; Swin Transformer achieved a Top-1 accuracy of 34.1%. Our benchmarks highlight the challenges associated with long-tail class distributions in low-resource sign languages, and RoCoISLR provides the initial foundation for systematic RoISLR research.

Paper Structure

This paper contains 8 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Examples of frames from the two data sources.
  • Figure 2: Histogram of the number of appearances of an individual sign in RoCoISLR.
  • Figure 3: Confusion matrix for the testing of Swin Transformer on RoCoISLR.