Table of Contents
Fetching ...

iSign: A Benchmark for Indian Sign Language Processing

Abhinav Joshi, Romit Mohanty, Mounika Kanakanti, Andesha Mangla, Sudeep Choudhary, Monali Barbate, Ashutosh Modi

TL;DR

This work tackles the scarcity of Indian Sign Language resources by introducing iSign, a large ISL–English benchmark dataset with 118,228 video–translation pairs and five benchmark tasks. It combines three standard tasks (translation and pose generation, word/gloss recognition) with two representation-learning tasks (word presence and semantic similarity) and provides baseline models and linguistic analysis to guide future work. The dataset is built from multiple public ISL sources, validated by certified signers, and accompanied by detailed discussions of data collection, alignment challenges, and evaluation metrics. The results reveal substantial room for improvement and underscore the need for ISL-specific architectures and evaluation strategies, highlighting the practical potential of standardized benchmarks to accelerate ISL NLP research.

Abstract

Indian Sign Language has limited resources for developing machine learning and data-driven approaches for automated language processing. Though text/audio-based language processing techniques have shown colossal research interest and tremendous improvements in the last few years, Sign Languages still need to catch up due to the need for more resources. To bridge this gap, in this work, we propose iSign: a benchmark for Indian Sign Language (ISL) Processing. We make three primary contributions to this work. First, we release one of the largest ISL-English datasets with more than 118K video-sentence/phrase pairs. To the best of our knowledge, it is the largest sign language dataset available for ISL. Second, we propose multiple NLP-specific tasks (including SignVideo2Text, SignPose2Text, Text2Pose, Word Prediction, and Sign Semantics) and benchmark them with the baseline models for easier access to the research community. Third, we provide detailed insights into the proposed benchmarks with a few linguistic insights into the workings of ISL. We streamline the evaluation of Sign Language processing, addressing the gaps in the NLP research community for Sign Languages. We release the dataset, tasks, and models via the following website: https://exploration-lab.github.io/iSign/

iSign: A Benchmark for Indian Sign Language Processing

TL;DR

This work tackles the scarcity of Indian Sign Language resources by introducing iSign, a large ISL–English benchmark dataset with 118,228 video–translation pairs and five benchmark tasks. It combines three standard tasks (translation and pose generation, word/gloss recognition) with two representation-learning tasks (word presence and semantic similarity) and provides baseline models and linguistic analysis to guide future work. The dataset is built from multiple public ISL sources, validated by certified signers, and accompanied by detailed discussions of data collection, alignment challenges, and evaluation metrics. The results reveal substantial room for improvement and underscore the need for ISL-specific architectures and evaluation strategies, highlighting the practical potential of standardized benchmarks to accelerate ISL NLP research.

Abstract

Indian Sign Language has limited resources for developing machine learning and data-driven approaches for automated language processing. Though text/audio-based language processing techniques have shown colossal research interest and tremendous improvements in the last few years, Sign Languages still need to catch up due to the need for more resources. To bridge this gap, in this work, we propose iSign: a benchmark for Indian Sign Language (ISL) Processing. We make three primary contributions to this work. First, we release one of the largest ISL-English datasets with more than 118K video-sentence/phrase pairs. To the best of our knowledge, it is the largest sign language dataset available for ISL. Second, we propose multiple NLP-specific tasks (including SignVideo2Text, SignPose2Text, Text2Pose, Word Prediction, and Sign Semantics) and benchmark them with the baseline models for easier access to the research community. Third, we provide detailed insights into the proposed benchmarks with a few linguistic insights into the workings of ISL. We streamline the evaluation of Sign Language processing, addressing the gaps in the NLP research community for Sign Languages. We release the dataset, tasks, and models via the following website: https://exploration-lab.github.io/iSign/
Paper Structure (12 sections, 6 figures, 9 tables)

This paper contains 12 sections, 6 figures, 9 tables.

Figures (6)

  • Figure 1: iSign Benchmark: The proposed benchmark for Indian Sign Language Processing.
  • Figure 2: An example showing the translation of the phrase "What, Where, How, and When" in Indian Sign Language. The text box length overlaps with the signs with a pause position in between.
  • Figure 3: The figure shows the distribution of the number of words in the target translation text
  • Figure 4: The figure shows an example of the educational content video where the signer signs for the corresponding textbook.
  • Figure 5: The figure shows an example of a frame from the iSign dataset video with the corresponding extracted keypoints.
  • ...and 1 more figures