BlazePose: On-device Real-time Body Pose tracking
Valentin Bazarevsky, Ivan Grishchenko, Karthik Raveendran, Tyler Zhu, Fan Zhang, Matthias Grundmann
TL;DR
BlazePose tackles real-time, on-device 2D pose estimation for a single person by fusing heatmap-supervised training with a lightweight regression stage to achieve efficient inference. The approach relies on a detector-tracker pipeline, BlazeFace-based person localization, a 33-keypoint topology, and occlusion-aware augmentation to enable robust tracking on mobile hardware. It demonstrates competitive accuracy and substantial speedups over OpenPose on mid-range phones, enabling practical applications such as sign language, fitness tracking, and AR. The work also lays groundwork for extension to 3D and integration with hand and facial geometry modules.
Abstract
We present BlazePose, a lightweight convolutional neural network architecture for human pose estimation that is tailored for real-time inference on mobile devices. During inference, the network produces 33 body keypoints for a single person and runs at over 30 frames per second on a Pixel 2 phone. This makes it particularly suited to real-time use cases like fitness tracking and sign language recognition. Our main contributions include a novel body pose tracking solution and a lightweight body pose estimation neural network that uses both heatmaps and regression to keypoint coordinates.
