Table of Contents
Fetching ...

GenGait: A Transformer-Based Model for Human Gait Anomaly Detection and Normative Twin Generation

Elisa Motta, Marta Lorenzini, Clara Mouawad, Alberto Ranavolo, Mariano Serrao, Arash Ajoudani

Abstract

Gait analysis provides an objective characterization of locomotor function and is widely used to support diagnosis and rehabilitation monitoring across neurological and orthopedic disorders. Deep learning has been increasingly applied to this domain, yet most approaches rely on supervised classifiers trained on disease-labeled data, limiting generalization to heterogeneous pathological presentations. This work proposes a label-free framework for joint-level anomaly detection and kinematic correction based on a Transformer masked autoencoder trained exclusively on normative gait sequences from 150 adults, acquired with a markerless multi-camera motion-capture system. At inference, a two-pass procedure is applied to potentially pathological input sequences, first it estimates joint inconsistency scores by occluding individual joints and measuring deviations from the learned normative prior. Then, it withholds the flagged joints from the encoder input and reconstructs the full skeleton from the remaining spatiotemporal context, yielding corrected kinematic trajectories at the flagged positions. Validation on 10 held-out normative participants, who mimicked seven simulated gait abnormalities, showed accurate localization of biomechanically inconsistent joints, a significant reduction in angular deviation across all analyzed joints with large effect sizes, and preservation of normative kinematics. The proposed approach enables interpretable, subject-specific localization of gait impairments without requiring disease labels. Video is available at https://youtu.be/Rcm3jqR5pN4.

GenGait: A Transformer-Based Model for Human Gait Anomaly Detection and Normative Twin Generation

Abstract

Gait analysis provides an objective characterization of locomotor function and is widely used to support diagnosis and rehabilitation monitoring across neurological and orthopedic disorders. Deep learning has been increasingly applied to this domain, yet most approaches rely on supervised classifiers trained on disease-labeled data, limiting generalization to heterogeneous pathological presentations. This work proposes a label-free framework for joint-level anomaly detection and kinematic correction based on a Transformer masked autoencoder trained exclusively on normative gait sequences from 150 adults, acquired with a markerless multi-camera motion-capture system. At inference, a two-pass procedure is applied to potentially pathological input sequences, first it estimates joint inconsistency scores by occluding individual joints and measuring deviations from the learned normative prior. Then, it withholds the flagged joints from the encoder input and reconstructs the full skeleton from the remaining spatiotemporal context, yielding corrected kinematic trajectories at the flagged positions. Validation on 10 held-out normative participants, who mimicked seven simulated gait abnormalities, showed accurate localization of biomechanically inconsistent joints, a significant reduction in angular deviation across all analyzed joints with large effect sizes, and preservation of normative kinematics. The proposed approach enables interpretable, subject-specific localization of gait impairments without requiring disease labels. Video is available at https://youtu.be/Rcm3jqR5pN4.

Paper Structure

This paper contains 12 sections, 5 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Pipeline overview. Five cameras at 30 Hz yield 3D joint positions. Pre-processing estimates missing joints via constrained interpolation and tokenizes sequences into $J{\times}T$ joint--frame tokens using a 7-frame sliding window with stride 1. Pass 1 gives the masking pattern producing mask $m$; Pass 2 reconstructs masked joints using a MAE Transformer. Post-processing segments gait cycles via left heel-height peak detection.
  • Figure 2: Masked autoencoder Transformer for joint reconstruction (Pass 2). A 7-frame window is tokenized into joint–frame tokens and linearly projected, then indexed by a sinusoidal positional code $\mathbf{P}$ and three learned embeddings $\mathbf{E}$ (joint type, frame index, and motion/velocity). A mask pattern $m$ (provided by Pass 1) specifies which token positions are hidden. The Token Masking operator replaces the selected tokens with a learned [MASK] placeholder, yielding a fixed-length masked sequence (visible tokens + [MASK]) processed by an 8-layer Transformer encoder. The encoder outputs a full-length memory sequence; visible memory vectors are selected using $\emph{memory}$$\setminus$$m$ and injected into the decoder input via Token Assembly, while masked positions are filled with [MASK]; $\mathbf{P}$ and $\mathbf{E}$ are added again before the 2-layer decoder. The Transformer decoder reconstructs the full token window, from which the reconstructed last-frame tokens are retained as the corrected pose estimate at inference.
  • Figure 3: Pass 1: mask identification. Training uses a curriculum of synthetic masks (random $\rightarrow$ structured) with temporally coherent spans to produce the mask list $m$. Inference uses tiled occlusions and a badness score $B_j$ to select unreliable joints and produce $m$. In both cases $m$ is then inputted to Pass 2 for masked reconstruction.
  • Figure 4: Skeletal reconstruction for the normative trial and the seven simulated anomalies. Each panel shows a different participant. Joints flagged as biomechanically inconsistent by Pass 1 are highlighted in red. Reconstructed skeletons (blue) are overlaid with input ones (gray). Participant IDs and flagged joints are labeled above each panel. Video animations for all conditions are available at https://youtu.be/Rcm3jqR5pN4.
  • Figure 5: Joint angle trajectories across the gait cycle per the normative and the seven pathological ones. Representative examples show pelvis flexion/extension (F/E), right hip abduction/adduction (A/A), right F/E, and right knee F/E for selected participant-anomaly pairs. Blue trajectory and bands represent the normative reference ($\mu \pm 2\sigma$), original mean trajectories are in red, and reconstructed mean trajectories in green.
  • ...and 1 more figures