MediaPipe Hands: On-device Real-time Hand Tracking
Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, Matthias Grundmann
TL;DR
This work presents a real-time, on-device hand tracking pipeline that predicts 2.5D hand landmarks from RGB input using a two-model approach: a BlazePalm palm detector and a hand landmark regressor, both implemented in MediaPipe. The system uses frame-to-frame propagation to reduce detector calls and is trained with a combination of real and synthetic datasets to improve accuracy and depth estimation. Key contributions include the mobile-optimized detector, a robust 21-landmark model with depth supervision, and an open-source MediaPipe implementation enabling cross-platform AR/gesture applications. The approach achieves real-time performance on commodity devices and supports multi-hand tracking with practical gating and synchronization mechanisms. Overall, MediaPipe Hands provides a practical, extensible solution for on-device hand tracking and interaction in AR/VR contexts.
Abstract
We present a real-time on-device hand tracking pipeline that predicts hand skeleton from single RGB camera for AR/VR applications. The pipeline consists of two models: 1) a palm detector, 2) a hand landmark model. It's implemented via MediaPipe, a framework for building cross-platform ML solutions. The proposed model and pipeline architecture demonstrates real-time inference speed on mobile GPUs and high prediction quality. MediaPipe Hands is open sourced at https://mediapipe.dev.
