The NGT200 Dataset: Geometric Multi-View Isolated Sign Recognition
Oline Ranum, David R. Wessels, Gomer Otterspeer, Erik J. Bekkers, Floris Roelofsen, Jari I. Andersen
TL;DR
This work introduces the NGT200 dataset to study multi-view isolated sign recognition (MV-ISR) with explicit emphasis on geometric and 3D-aware representations. It demonstrates that MV-ISR is distinct from single-view ISR by showing view-angle sensitivity and the benefits of incorporating multiple views, synthetic data, and geometric inductive biases. Methodologically, the study benchmarks a Sign Language Graph Convolution Network (SL-GCN) on reduced pose graphs and then advances to a SE(2)-equivariant temporal-PONITA model, achieving higher accuracy and stability. The findings suggest that multi-view pose-based approaches, augmented with synthetic data and geometry-informed models, offer a scalable and privacy-preserving pathway toward robust sign language recognition, with implications for real-world MV-SLR systems and future dataset expansions.
Abstract
Sign Language Processing (SLP) provides a foundation for a more inclusive future in language technology; however, the field faces several significant challenges that must be addressed to achieve practical, real-world applications. This work addresses multi-view isolated sign recognition (MV-ISR), and highlights the essential role of 3D awareness and geometry in SLP systems. We introduce the NGT200 dataset, a novel spatio-temporal multi-view benchmark, establishing MV-ISR as distinct from single-view ISR (SV-ISR). We demonstrate the benefits of synthetic data and propose conditioning sign representations on spatial symmetries inherent in sign language. Leveraging an SE(2) equivariant model improves MV-ISR performance by 8%-22% over the baseline.
