Testing MediaPipe Holistic for Linguistic Analysis of Nonmanual Markers in Sign Languages
Anna Kuznetsova, Vadim Kimmelman
TL;DR
This work assesses whether MediaPipe Holistic can support linguistic analysis of nonmanual markers in sign languages by comparing it to OpenFace on KRSL data and a targeted head-tilt/eyebrow dataset. The authors analyze eyebrow-position signals, including head-pitch distortions, and find that MPH introduces complex, direction- and distance-dependent distortions that obscure genuine eyebrow patterns, unlike OF which shows different systematic distortions that can be corrected with a prior model. The study demonstrates that MPH, in its current form, cannot be directly used for reliable linguistic analysis without substantial corrective modeling, and it highlights the need for robust, generalized distortion-correction pipelines for CV landmarks in sign-language research. These findings caution researchers about relying on MPH for nonmanual marker analyses and motivate development of tailored correction methods to enable scalable, automated linguistics research in sign languages.
Abstract
Advances in Deep Learning have made possible reliable landmark tracking of human bodies and faces that can be used for a variety of tasks. We test a recent Computer Vision solution, MediaPipe Holistic (MPH), to find out if its tracking of the facial features is reliable enough for a linguistic analysis of data from sign languages, and compare it to an older solution (OpenFace, OF). We use an existing data set of sentences in Kazakh-Russian Sign Language and a newly created small data set of videos with head tilts and eyebrow movements. We find that MPH does not perform well enough for linguistic analysis of eyebrow movement - but in a different way from OF, which is also performing poorly without correction. We reiterate a previous proposal to train additional correction models to overcome these limitations.
