Table of Contents
Fetching ...

ASL STEM Wiki: Dataset and Benchmark for Interpreting STEM Articles

Kayo Yin, Chinmay Singh, Fyodor O. Minakov, Vanessa Milan, Hal Daumé, Cyril Zhang, Alex X. Lu, Danielle Bragg

TL;DR

ASL STEM Wiki is the first continuous signing dataset focused on STEM, facilitating the development of AI resources for STEM education in ASL and models to identify fingerspelled words are developed—which can later be used to query for appropriate ASL signs to suggest to interpreters.

Abstract

Deaf and hard-of-hearing (DHH) students face significant barriers in accessing science, technology, engineering, and mathematics (STEM) education, notably due to the scarcity of STEM resources in signed languages. To help address this, we introduce ASL STEM Wiki: a parallel corpus of 254 Wikipedia articles on STEM topics in English, interpreted into over 300 hours of American Sign Language (ASL). ASL STEM Wiki is the first continuous signing dataset focused on STEM, facilitating the development of AI resources for STEM education in ASL. We identify several use cases of ASL STEM Wiki with human-centered applications. For example, because this dataset highlights the frequent use of fingerspelling for technical concepts, which inhibits DHH students' ability to learn, we develop models to identify fingerspelled words -- which can later be used to query for appropriate ASL signs to suggest to interpreters.

ASL STEM Wiki: Dataset and Benchmark for Interpreting STEM Articles

TL;DR

ASL STEM Wiki is the first continuous signing dataset focused on STEM, facilitating the development of AI resources for STEM education in ASL and models to identify fingerspelled words are developed—which can later be used to query for appropriate ASL signs to suggest to interpreters.

Abstract

Deaf and hard-of-hearing (DHH) students face significant barriers in accessing science, technology, engineering, and mathematics (STEM) education, notably due to the scarcity of STEM resources in signed languages. To help address this, we introduce ASL STEM Wiki: a parallel corpus of 254 Wikipedia articles on STEM topics in English, interpreted into over 300 hours of American Sign Language (ASL). ASL STEM Wiki is the first continuous signing dataset focused on STEM, facilitating the development of AI resources for STEM education in ASL. We identify several use cases of ASL STEM Wiki with human-centered applications. For example, because this dataset highlights the frequent use of fingerspelling for technical concepts, which inhibits DHH students' ability to learn, we develop models to identify fingerspelled words -- which can later be used to query for appropriate ASL signs to suggest to interpreters.

Paper Structure

This paper contains 32 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: One use case of ASL STEM Wiki is automatic sign suggestion. Given an English sentence and a video of its ASL interpretation, the model detects all clips of ASL that contains fingerspelling (FS). Then, given the detected FS clip and the English sentence, the model identifies which English phrase in the sentence is fingerspelled in the clip. The English phrase can be used to query an ASL lexicon and suggest ASL signs.
  • Figure 2: The bilingual resource used to both collect and display ASL STEM Wiki. The design was proposed in glasser2022asl. Contributors select a sentence that they would like to interpret, which activates their webcam for recording. Consumers can read articles in English and access ASL interpretations for desired sentences.
  • Figure 3: Scatterplot of video size. x-axis: sentence length (characters), y-axis: video length (seconds).
  • Figure 4: Fingerspelling detection model. Frames of ASL keypoints are processed by a temporal graph convolutional network, and the associated English sentence is processed by the CANINE pre-trained language model. The two representations are concatenated then passed to a linear layer to predict fingerspelling frames.