RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose

Tao Jiang; Peng Lu; Li Zhang; Ningsheng Ma; Rui Han; Chengqi Lyu; Yining Li; Kai Chen

RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose

Tao Jiang, Peng Lu, Li Zhang, Ningsheng Ma, Rui Han, Chengqi Lyu, Yining Li, Kai Chen

TL;DR

RTMPose tackles the gap between academic pose-estimation performance and industrial real-time requirements by adopting a top-down pipeline with a CSPNeXt backbone and a SimCC-based coordinate-classification for keypoints. It integrates targeted training strategies, module-level refinements, and an optimized inference pipeline (including skip-frame detection and temporal smoothing) to deliver real-time multi-person pose estimation across CPU, GPU, and mobile backends. The method achieves 75.8% AP on COCO with 90+ FPS on CPU and 430+ FPS on GPU for RTMPose-m, and competitive WholeBody results, while maintaining low computational cost; it also demonstrates strong mobile performance and practical deployability via MMDeploy. The work provides a thorough empirical study of paradigm, backbone, localization, training, and deployment choices, offering a practical, open-source solution for industrial real-time pose estimation.

Abstract

Recent studies on 2D pose estimation have achieved excellent performance on public benchmarks, yet its application in the industrial community still suffers from heavy model parameters and high latency. In order to bridge this gap, we empirically explore key factors in pose estimation including paradigm, model architecture, training strategy, and deployment, and present a high-performance real-time multi-person pose estimation framework, RTMPose, based on MMPose. Our RTMPose-m achieves 75.8% AP on COCO with 90+ FPS on an Intel i7-11700 CPU and 430+ FPS on an NVIDIA GTX 1660 Ti GPU, and RTMPose-l achieves 67.0% AP on COCO-WholeBody with 130+ FPS. To further evaluate RTMPose's capability in critical real-time applications, we also report the performance after deploying on the mobile device. Our RTMPose-s achieves 72.2% AP on COCO with 70+ FPS on a Snapdragon 865 chip, outperforming existing open-source libraries. Code and models are released at https://github.com/open-mmlab/mmpose/tree/1.x/projects/rtmpose.

RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose

TL;DR

Abstract

Paper Structure (32 sections, 6 equations, 5 figures, 12 tables)

This paper contains 32 sections, 6 equations, 5 figures, 12 tables.

Introduction
Related Work
Bottom-up Approaches.
Top-down Approaches.
Coordinate Classification.
Vision Transformers.
Methodology
SimCC: A lightweight yet strong baseline
Preliminary
Baseline
Training Techniques
Pre-training
Optimization Strategy
Two-stage training augmentations
Module Design
...and 17 more sections

Figures (5)

Figure 1: Comparison of RTMPose and open-source libraries on COCO val set regarding model size, latency, and precision. The circle size represents the relative size of model parameters.
Figure 2: The overall architecture of RTMPose, which contains a convolutional layer, a fully-connected layer and a Gated Attention Unit (GAU) to refine K keypoint representations. After that 2d pose estimation is regarded as two classification tasks for x-axis and y-axis coordinates to predict the horizontal and vertical locations of keypoints.
Figure 3: Step-by-step improvements from a SimCC baseline.
Figure 4: Inference pipeline of RTMPose.
Figure 5: Comparison of GFLOPs and accuracy. Left: Comparison of RTMPose and other open-source pose estimation libraries on full COCO val set. Right: Comparison of RTMPose and other open-source pose estimation libraries on COCO-SinglePerson val set.

RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose

TL;DR

Abstract

RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose

Authors

TL;DR

Abstract

Table of Contents

Figures (5)