Table of Contents
Fetching ...

Towards Unconstrained 2D Pose Estimation of the Human Spine

Muhammad Saif Ullah Khan, Stephan Krauß, Didier Stricker

TL;DR

SpineTrack addresses the lack of detailed spine articulation in 2D pose datasets by introducing nine vertebral keypoints and a combined real/synthetic dataset. The SpinePose framework uses teacher–student knowledge distillation and four targeted losses to integrate spine landmarks into existing body pose networks while enforcing anatomical plausibility. Biomechanical validation via OpenSim and an active-learning annotation pipeline ensure anatomically consistent spine labels in real-world images. Experiments demonstrate spine-aware improvements on SpineTrack without sacrificing performance on standard benchmarks, paving the way for precise biomechanical analysis and 3D spine reconstruction in unconstrained images.

Abstract

We present SpineTrack, the first comprehensive dataset for 2D spine pose estimation in unconstrained settings, addressing a crucial need in sports analytics, healthcare, and realistic animation. Existing pose datasets often simplify the spine to a single rigid segment, overlooking the nuanced articulation required for accurate motion analysis. In contrast, SpineTrack annotates nine detailed spinal keypoints across two complementary subsets: a synthetic set comprising 25k annotations created using Unreal Engine with biomechanical alignment through OpenSim, and a real-world set comprising over 33k annotations curated via an active learning pipeline that iteratively refines automated annotations with human feedback. This integrated approach ensures anatomically consistent labels at scale, even for challenging, in-the-wild images. We further introduce SpinePose, extending state-of-the-art body pose estimators using knowledge distillation and an anatomical regularization strategy to jointly predict body and spine keypoints. Our experiments in both general and sports-specific contexts validate the effectiveness of SpineTrack for precise spine pose estimation, establishing a robust foundation for future research in advanced biomechanical analysis and 3D spine reconstruction in the wild.

Towards Unconstrained 2D Pose Estimation of the Human Spine

TL;DR

SpineTrack addresses the lack of detailed spine articulation in 2D pose datasets by introducing nine vertebral keypoints and a combined real/synthetic dataset. The SpinePose framework uses teacher–student knowledge distillation and four targeted losses to integrate spine landmarks into existing body pose networks while enforcing anatomical plausibility. Biomechanical validation via OpenSim and an active-learning annotation pipeline ensure anatomically consistent spine labels in real-world images. Experiments demonstrate spine-aware improvements on SpineTrack without sacrificing performance on standard benchmarks, paving the way for precise biomechanical analysis and 3D spine reconstruction in unconstrained images.

Abstract

We present SpineTrack, the first comprehensive dataset for 2D spine pose estimation in unconstrained settings, addressing a crucial need in sports analytics, healthcare, and realistic animation. Existing pose datasets often simplify the spine to a single rigid segment, overlooking the nuanced articulation required for accurate motion analysis. In contrast, SpineTrack annotates nine detailed spinal keypoints across two complementary subsets: a synthetic set comprising 25k annotations created using Unreal Engine with biomechanical alignment through OpenSim, and a real-world set comprising over 33k annotations curated via an active learning pipeline that iteratively refines automated annotations with human feedback. This integrated approach ensures anatomically consistent labels at scale, even for challenging, in-the-wild images. We further introduce SpinePose, extending state-of-the-art body pose estimators using knowledge distillation and an anatomical regularization strategy to jointly predict body and spine keypoints. Our experiments in both general and sports-specific contexts validate the effectiveness of SpineTrack for precise spine pose estimation, establishing a robust foundation for future research in advanced biomechanical analysis and 3D spine reconstruction in the wild.

Paper Structure

This paper contains 25 sections, 8 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: SpineTrack-Unreal Generation. We show the data generation pipeline with Unreal Engine unrealengine which we use to obtain synthetic images and ground-truth locations of keypoints roughly corresponding to SpineTrack skeleton. Keypoint positions are refined by scaling the biomechanical model to each actor and using an inverse kinematics solver to recompute marker positions. Images are augmented with diverse scenes from OpenImagesV7 dataset OpenImages using SAM kirillov2023segment to simulate more realistic backgrounds.
  • Figure 2: Spine Keypoints. We annotate nine keypoints along the spinal column to capture upper body movement without excessive overhead. Three are placed on the cervical spine (C1, C4, C7), two on the thoracic spine (T3, T8), three on the lumbar spine (L1, L3, L5), and one at the sacrum near the pelvis. This distribution balances anatomical realism and annotation cost, reflecting the distinct mobility of each spinal region.
  • Figure 3: SpineTrack-Real Creation. On the left, we show an OpenSim seth2011opensim model adapted from rajagopal2016fullbeaucage2019validationpagnon2021pose2sim with all 35 anatomical markers labeled in our dataset, including 17 COCO keypoints lin2014microsoft, head top, feet (six), spine (nine), and two sternoclavicular joints. On the right, our iterative pipeline begins with real-world image selection and initial pseudo-labels generated by a pretrained model. A preliminary spine-aware model is trained on these labels plus SpineTrack-Unreal. Annotations are then refined in batches, where human annotators correct model predictions, and the improved labels are fed back into the training data to fine-tune the model. This process repeats for all batches, improving model accuracy and reducing data noise at each step.
  • Figure 4: SpineTrack Dataset. Example images illustrating the range of human activities, occlusions, and body shapes in SpineTrack, with detailed spine keypoints and standard body landmarks.
  • Figure 5: SpinePose Architecture. Our teacher–student approach for 2D spine pose estimation integrates knowledge from a pretrained body expert with newly introduced spine keypoints. The student model expands the teacher’s head to predict both body and spine heatmaps, using a combination of distillation losses, a positional keypoint loss, and our structure-based losses to achieve anatomically consistent predictions. The final objective balances accuracy on existing body joints with the newly added vertebral keypoints, ensuring robust performance in real-world scenarios.
  • ...and 1 more figures