Table of Contents
Fetching ...

BlazePose: On-device Real-time Body Pose tracking

Valentin Bazarevsky, Ivan Grishchenko, Karthik Raveendran, Tyler Zhu, Fan Zhang, Matthias Grundmann

TL;DR

BlazePose tackles real-time, on-device 2D pose estimation for a single person by fusing heatmap-supervised training with a lightweight regression stage to achieve efficient inference. The approach relies on a detector-tracker pipeline, BlazeFace-based person localization, a 33-keypoint topology, and occlusion-aware augmentation to enable robust tracking on mobile hardware. It demonstrates competitive accuracy and substantial speedups over OpenPose on mid-range phones, enabling practical applications such as sign language, fitness tracking, and AR. The work also lays groundwork for extension to 3D and integration with hand and facial geometry modules.

Abstract

We present BlazePose, a lightweight convolutional neural network architecture for human pose estimation that is tailored for real-time inference on mobile devices. During inference, the network produces 33 body keypoints for a single person and runs at over 30 frames per second on a Pixel 2 phone. This makes it particularly suited to real-time use cases like fitness tracking and sign language recognition. Our main contributions include a novel body pose tracking solution and a lightweight body pose estimation neural network that uses both heatmaps and regression to keypoint coordinates.

BlazePose: On-device Real-time Body Pose tracking

TL;DR

BlazePose tackles real-time, on-device 2D pose estimation for a single person by fusing heatmap-supervised training with a lightweight regression stage to achieve efficient inference. The approach relies on a detector-tracker pipeline, BlazeFace-based person localization, a 33-keypoint topology, and occlusion-aware augmentation to enable robust tracking on mobile hardware. It demonstrates competitive accuracy and substantial speedups over OpenPose on mid-range phones, enabling practical applications such as sign language, fitness tracking, and AR. The work also lays groundwork for extension to 3D and integration with hand and facial geometry modules.

Abstract

We present BlazePose, a lightweight convolutional neural network architecture for human pose estimation that is tailored for real-time inference on mobile devices. During inference, the network produces 33 body keypoints for a single person and runs at over 30 frames per second on a Pixel 2 phone. This makes it particularly suited to real-time use cases like fitness tracking and sign language recognition. Our main contributions include a novel body pose tracking solution and a lightweight body pose estimation neural network that uses both heatmaps and regression to keypoint coordinates.

Paper Structure

This paper contains 10 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Inference pipeline. See text.
  • Figure 2: Vitruvian man aligned via our detector vs. face detection bounding box. See text for details.
  • Figure 3: 33 keypoint topology.
  • Figure 4: Network architecture. See text for details.
  • Figure 5: BlazePose results on upper-body case
  • ...and 1 more figures