Table of Contents
Fetching ...

Modelling the Distribution of Human Motion for Sign Language Assessment

Oliver Cory, Ozge Mercanoglu Sincan, Matthew Vowels, Alessia Battisti, Franz Holzknecht, Katja Tissi, Sandra Sidler-Miserez, Tobias Haug, Sarah Ebling, Richard Bowden

TL;DR

This work addresses SLA for continuous Sign Language by modelling the natural distribution of human motion across multiple native signers. It introduces a SkeletonVAE to embed 3D skeletal poses, selects a representative reference per sentence, and builds a Motion Envelope with Gaussian Processes to quantify learner deviations. Across a Sentence Repetition Test dataset, the approach correlates with human ratings and enables spatio-temporal anomaly detection for targeted feedback. The method offers interpretable, probabilistic assessments and lays groundwork for extending to non-manual features in future SLA systems.

Abstract

Sign Language Assessment (SLA) tools are useful to aid in language learning and are underdeveloped. Previous work has focused on isolated signs or comparison against a single reference video to assess Sign Languages (SL). This paper introduces a novel SLA tool designed to evaluate the comprehensibility of SL by modelling the natural distribution of human motion. We train our pipeline on data from native signers and evaluate it using SL learners. We compare our results to ratings from a human raters study and find strong correlation between human ratings and our tool. We visually demonstrate our tools ability to detect anomalous results spatio-temporally, providing actionable feedback to aid in SL learning and assessment.

Modelling the Distribution of Human Motion for Sign Language Assessment

TL;DR

This work addresses SLA for continuous Sign Language by modelling the natural distribution of human motion across multiple native signers. It introduces a SkeletonVAE to embed 3D skeletal poses, selects a representative reference per sentence, and builds a Motion Envelope with Gaussian Processes to quantify learner deviations. Across a Sentence Repetition Test dataset, the approach correlates with human ratings and enables spatio-temporal anomaly detection for targeted feedback. The method offers interpretable, probabilistic assessments and lays groundwork for extending to non-manual features in future SLA systems.

Abstract

Sign Language Assessment (SLA) tools are useful to aid in language learning and are underdeveloped. Previous work has focused on isolated signs or comparison against a single reference video to assess Sign Languages (SL). This paper introduces a novel SLA tool designed to evaluate the comprehensibility of SL by modelling the natural distribution of human motion. We train our pipeline on data from native signers and evaluate it using SL learners. We compare our results to ratings from a human raters study and find strong correlation between human ratings and our tool. We visually demonstrate our tools ability to detect anomalous results spatio-temporally, providing actionable feedback to aid in SL learning and assessment.
Paper Structure (20 sections, 4 equations, 4 figures, 3 tables)

This paper contains 20 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Diagram showing training pipeline for modelling the $j^{th}$ sentence over K signers. The process takes J example sentences captured with C independent cameras and uses 3D pose uplift to create a set of $\mathbf{x}$ poses which are fed into the VAE, encoding the poses into $\boldsymbol{\hat{\mu}}$ latent means. Reference Selection finds the central signal $\hat{\boldsymbol{\mu}_{ref}}$ and learns a distribution over K signers.
  • Figure 2: Example frame from the dataset showing \ref{['fig:rgb_img_example']} the RGB frame of a participant from one of the camera views, \ref{['fig:extract_skel_example']} the uplifted 3D skeleton, and \ref{['fig:canon_skel']} the bone length adjusted canonical skeleton.
  • Figure 3: Figure showing standardised PD Measures against the standardised manual ratings for sentence A. The blue points represent the language learners that produced the sentence, labelled with their predefined signer ID. The black line represents the line of best fit from the linear regression.
  • Figure 4: Top plot shows a section of from the latent dimension of the Motion Envelope Confidence Region with encoded SkeletonVAE signals overlayed for Sentence A. Below, decoded pose data for the latents is visualised for Learner 5 (top), Learner 13 (middle) and one Native Signer (bottom) for Pose Numbers 165-185 in steps of 5. The red circle indicates Learner 5's peak deviation from the distribution.