Table of Contents
Fetching ...

Generation of synthetic gait data: application to multiple sclerosis patients' gait patterns

Klervi Le Gall, Lise Bellanger, David Laplaud, Aymeric Stamm

TL;DR

A comprehensive framework is proposed for transforming QTS data into a form that preserves the essential geometric properties of gait while enabling the use of any tabular synthetic data generation method, and a synthetic data generation method is introduced, based on nearest neighbors weighting, which produces high-fidelity synthetic QTS data suitable for small datasets and private data environments.

Abstract

Multiple sclerosis (MS) is the leading cause of severe non-traumatic disability in young adults and its incidence is increasing worldwide. The variability of gait impairment in MS necessitates the development of a non-invasive, sensitive, and cost-effective tool for quantitative gait evaluation. The eGait movement sensor, designed to characterize human gait through unit quaternion time series (QTS) representing hip rotations, is a promising approach. However, the small sample sizes typical of clinical studies pose challenges for the stability of gait data analysis tools. To address these challenges, this article presents two key scientific contributions. First, a comprehensive framework is proposed for transforming QTS data into a form that preserves the essential geometric properties of gait while enabling the use of any tabular synthetic data generation method. Second, a synthetic data generation method is introduced, based on nearest neighbors weighting, which produces high-fidelity synthetic QTS data suitable for small datasets and private data environments. The effectiveness of the proposed method, is demonstrated through its application to MS gait data, showing very good fidelity and respect of the initial geometry of the data. Thanks to this work, we are able to produce synthetic data sets and work on the stability of clustering methods.

Generation of synthetic gait data: application to multiple sclerosis patients' gait patterns

TL;DR

A comprehensive framework is proposed for transforming QTS data into a form that preserves the essential geometric properties of gait while enabling the use of any tabular synthetic data generation method, and a synthetic data generation method is introduced, based on nearest neighbors weighting, which produces high-fidelity synthetic QTS data suitable for small datasets and private data environments.

Abstract

Multiple sclerosis (MS) is the leading cause of severe non-traumatic disability in young adults and its incidence is increasing worldwide. The variability of gait impairment in MS necessitates the development of a non-invasive, sensitive, and cost-effective tool for quantitative gait evaluation. The eGait movement sensor, designed to characterize human gait through unit quaternion time series (QTS) representing hip rotations, is a promising approach. However, the small sample sizes typical of clinical studies pose challenges for the stability of gait data analysis tools. To address these challenges, this article presents two key scientific contributions. First, a comprehensive framework is proposed for transforming QTS data into a form that preserves the essential geometric properties of gait while enabling the use of any tabular synthetic data generation method. Second, a synthetic data generation method is introduced, based on nearest neighbors weighting, which produces high-fidelity synthetic QTS data suitable for small datasets and private data environments. The effectiveness of the proposed method, is demonstrated through its application to MS gait data, showing very good fidelity and respect of the initial geometry of the data. Thanks to this work, we are able to produce synthetic data sets and work on the stability of clustering methods.

Paper Structure

This paper contains 27 sections, 1 theorem, 17 equations, 9 figures, 1 table, 1 algorithm.

Key Result

Theorem 4.1

Let $F \in \mathcal{F}(F_1, \dots, F_p)$ be a p-dimensional distribution function with marginals $F_1, \dots, F_p$. Then there exist a copula $C$ which is a p-dimensional distribution function on $[0,1]^p$ with uniform marginals such that

Figures (9)

  • Figure 1: SynGait. Schematic overview of the proposed comprehensive framework for unit QTS synthetic data generation.
  • Figure 2: Overview of how a GAN model is trained.
  • Figure 3: Manifold approximation via k-NNG. A cloud of $n = 100$ points sampled in the space between the two red circles. Three k-NNGs are used to approximate the corresponding manifold: with $k=10$ (left panel), with $k=30$ (middle panel) and with $k=50$ (right panel). It illustrates that a value of $k$ close to $30$ is best in this case to approximate the space between the red circles.
  • Figure 4: Scatterplot of the first two principal components scores for the log-QFD of the $27$ individuals from the MYO study.
  • Figure 5: Generated individual gait patterns. Using the original data (1st column), the proposed SynGait method (2nd column), the copula method (3rd column) and the CTGAN method (4th column).
  • ...and 4 more figures

Theorems & Definitions (10)

  • Definition 2.1: unit quaternion
  • Definition 2.2: unit quaternion time series
  • Definition 3.1: logarithmic map
  • Definition 3.2: exponential map
  • Definition 3.3: log-quaternion time series
  • Definition 3.4: log-quaternion functional datum
  • Definition 3.5: sample mean
  • Definition 3.6: sample covariance kernel function
  • Definition 3.7: sample covariance operator
  • Theorem 4.1: Sklar's theorem