Slovo: Russian Sign Language Dataset

Alexander Kapitanov; Karina Kvanchiani; Alexander Nagaev; Elizaveta Petrova

Slovo: Russian Sign Language Dataset

Alexander Kapitanov, Karina Kvanchiani, Alexander Nagaev, Elizaveta Petrova

TL;DR

This paper presents the Russian Sign Language (RSL) video dataset Slovo, produced using crowdsourcing platforms and provides the entire dataset creation pipeline, from data collection to video annotation, with the following demo application.

Abstract

One of the main challenges of the sign language recognition task is the difficulty of collecting a suitable dataset due to the gap between hard-of-hearing and hearing societies. In addition, the sign language in each country differs significantly, which obliges the creation of new data for each of them. This paper presents the Russian Sign Language (RSL) video dataset Slovo, produced using crowdsourcing platforms. The dataset contains 20,000 FullHD recordings, divided into 1,000 classes of isolated RSL gestures received by 194 signers. We also provide the entire dataset creation pipeline, from data collection to video annotation, with the following demo application. Several neural networks are trained and evaluated on the Slovo to demonstrate its teaching ability. Proposed data and pre-trained models are publicly available.

Slovo: Russian Sign Language Dataset

TL;DR

Abstract

Paper Structure (9 sections, 5 figures, 1 table)

This paper contains 9 sections, 5 figures, 1 table.

Introduction
Related Work
Sign Language Datasets in Russian Domain.
Others Sign Language Datasets.
Sign Language Dataset Collection.
Dataset Creation
Dataset Description
Experiments
Conclusion

Figures (5)

Figure 1: RSL signs "at eight fifteen" (left top), "appetite" (left bottom), "yellow" (right top), and "this" (right bottom).
Figure 2: Crowdsourcing pipeline: collection, validation, and annotation. Each stage used its own rules, but the exam was the same.
Figure 3: Time intervals aggregation pipeline. First, we split the beginning and end timestamps into different groups and then independently calculated distances between all points in each group. Then, if the maximum distance is less than 30 frames, we find the average value of each group and assume them to be the final pair (begin, end). Otherwise, video with such annotations was not taken into the dataset.
Figure 4: Video length, resolution and user's splitting analysis. (a) Videos' number of frames distribution divided into sets, (b) distribution of recorded video by users in train, and (c) test, (d) video resolution ratio.
Figure 5: Mean accuracy is achieved by each model on the Slovo with different sampling strategies. Note that the graphs have various scales depending on the order of the metrics.

Slovo: Russian Sign Language Dataset

TL;DR

Abstract

Slovo: Russian Sign Language Dataset

Authors

TL;DR

Abstract

Table of Contents

Figures (5)