Table of Contents
Fetching ...

3D-LEX v1.0: 3D Lexicons for American Sign Language and Sign Language of the Netherlands

Oline Ranum, Gomer Otterspeer, Jari I. Andersen, Robert G. Belleman, Floris Roelofsen

TL;DR

An efficient approach for capturing sign language in 3D is presented, the 3D-LEX v1.0 dataset is introduced, and a method for semi-automatic annotation of phonetic properties is detailed, to support studies in 3D-aware sign language processing.

Abstract

In this work, we present an efficient approach for capturing sign language in 3D, introduce the 3D-LEX v1.0 dataset, and detail a method for semi-automatic annotation of phonetic properties. Our procedure integrates three motion capture techniques encompassing high-resolution 3D poses, 3D handshapes, and depth-aware facial features, and attains an average sampling rate of one sign every 10 seconds. This includes the time for presenting a sign example, performing and recording the sign, and archiving the capture. The 3D-LEX dataset includes 1,000 signs from American Sign Language and an additional 1,000 signs from the Sign Language of the Netherlands. We showcase the dataset utility by presenting a simple method for generating handshape annotations directly from 3D-LEX. We produce handshape labels for 1,000 signs from American Sign Language and evaluate the labels in a sign recognition task. The labels enhance gloss recognition accuracy by 5% over using no handshape annotations, and by 1% over expert annotations. Our motion capture data supports in-depth analysis of sign features and facilitates the generation of 2D projections from any viewpoint. The 3D-LEX collection has been aligned with existing sign language benchmarks and linguistic resources, to support studies in 3D-aware sign language processing.

3D-LEX v1.0: 3D Lexicons for American Sign Language and Sign Language of the Netherlands

TL;DR

An efficient approach for capturing sign language in 3D is presented, the 3D-LEX v1.0 dataset is introduced, and a method for semi-automatic annotation of phonetic properties is detailed, to support studies in 3D-aware sign language processing.

Abstract

In this work, we present an efficient approach for capturing sign language in 3D, introduce the 3D-LEX v1.0 dataset, and detail a method for semi-automatic annotation of phonetic properties. Our procedure integrates three motion capture techniques encompassing high-resolution 3D poses, 3D handshapes, and depth-aware facial features, and attains an average sampling rate of one sign every 10 seconds. This includes the time for presenting a sign example, performing and recording the sign, and archiving the capture. The 3D-LEX dataset includes 1,000 signs from American Sign Language and an additional 1,000 signs from the Sign Language of the Netherlands. We showcase the dataset utility by presenting a simple method for generating handshape annotations directly from 3D-LEX. We produce handshape labels for 1,000 signs from American Sign Language and evaluate the labels in a sign recognition task. The labels enhance gloss recognition accuracy by 5% over using no handshape annotations, and by 1% over expert annotations. Our motion capture data supports in-depth analysis of sign features and facilitates the generation of 2D projections from any viewpoint. The 3D-LEX collection has been aligned with existing sign language benchmarks and linguistic resources, to support studies in 3D-aware sign language processing.
Paper Structure (33 sections, 7 figures, 4 tables)

This paper contains 33 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Motion capture techniques: The NGT sign 'mango' captured with the three collection techniques. Left: Pose data captured with Vicon Motion Capture displayed in Shogun Live; Top right: face features captured with Live Link Face (Epic Games); Bottom right: handshapes captured with gloves displayed in Hand Engine (StretchSense).
  • Figure 2: Setup of the Vicon detection zone: The illustration indicates the placement of the Vero Cameras on the rig and in front of the signer.
  • Figure 3: Marker layout for the Vicon system: Layout according to FrontWaist 53-marker set template, displayed on signer in Shogun Live.
  • Figure 4: Distributions of handshapes in the 3D-LEX vocabulary: the distribution of handshapes as identified by (a) human experts and (b) the automated annotation process described in Section 4.1. The automatic annotations assign arbitrary cluster IDs to different groups of handshapes determined through a K-means clustering method. It's important to note that these handshape cluster IDs may not directly correspond to the linguistic labels used by human experts in Subfigure 4.a.
  • Figure 5: Time-series visualization of handshape classification: Classification of the ASL sign 'zero', labeled by experts with the handshape 'o'. Frames are captured and displayed as bars, and each bar's color indicates the handshape, determined by applying the Euclidean distance method frame-by-frame. White space indicates that no data was recorded at that time. The timeline, marked on the x-axis, spans four seconds for this sign. A detailed view at the 1-second mark is provided in the lower row for closer inspection. Our segmentation pipeline identifies the handshapes '5', 'f', 'c', and 'o', selecting frames corresponding to 'o' as the characteristic signal of 'zero'.
  • ...and 2 more figures