3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement

Filipa Lino; Carlos Santiago; Manuel Marques

3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement

Filipa Lino, Carlos Santiago, Manuel Marques

TL;DR

Occlusions pose a major challenge for monocular 3D HPE. The authors introduce BlendMimic3D, a synthetic occlusion-rich dataset generated in Blender to train and benchmark 3D pose estimation under occlusions, and a Graph Convolutional Network (GCN) pose refinement block that plugs into existing 3D HPE backbones without retraining them. The GCN leverages a spatial-temporal graph of joints to refine 3D poses, trained on BlendMimic3D, and improves occlusion handling across multiple backbones (VideoPose3D, PoseFormerV2, D3DP) and 2D detectors (CPN, Detectron2), achieving substantial MPJPE reductions, particularly in occluded scenarios, while preserving non-occluded accuracy. This work provides a practical path toward robust occlusion-aware 3D HPE in real-world applications and offers a benchmark and refinement mechanism that can be adopted with minimal changes to existing pipelines.

Abstract

In the field of 3D Human Pose Estimation (HPE), accurately estimating human pose, especially in scenarios with occlusions, is a significant challenge. This work identifies and addresses a gap in the current state of the art in 3D HPE concerning the scarcity of data and strategies for handling occlusions. We introduce our novel BlendMimic3D dataset, designed to mimic real-world situations where occlusions occur for seamless integration in 3D HPE algorithms. Additionally, we propose a 3D pose refinement block, employing a Graph Convolutional Network (GCN) to enhance pose representation through a graph model. This GCN block acts as a plug-and-play solution, adaptable to various 3D HPE frameworks without requiring retraining them. By training the GCN with occluded data from BlendMimic3D, we demonstrate significant improvements in resolving occluded poses, with comparable results for non-occluded ones. Project web page is available at https://blendmimic3d.github.io/BlendMimic3D/.

3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement

TL;DR

Abstract

Paper Structure (18 sections, 3 equations, 10 figures, 3 tables)

This paper contains 18 sections, 3 equations, 10 figures, 3 tables.

Introduction
Related Work
From 2D to 3D Transition
2D HPE
3D HPE
Graph Convolutional Network
HPE Datasets
BlendMimic3D Dataset
Pose Refinement with GCN
Experimental Setup
Datasets and Evaluation Metrics
Implementation Details
Experimental Results
Quantitative results
Qualitative results
...and 3 more sections

Figures (10)

Figure 1: BlendMimic3D, our synthetic dataset for 3D HPE occlusion benchmarking, features diverse multi-camera scenarios with up to three subjects. It includes Blender animations (top left), keypoint visibility (top right), cameras' parameters, 3D poses (bottom left) and 2D pose representations (bottom right).
Figure 2: Visual representation of different scenes from BlendMimic3D datasets. From left to right: synthetic subjects, SS1, SS2 and SS3.
Figure 3: Left: Camera distribution with the world coordinate system at the origin, with subject SS1 of BlendMimic3D dataset. Right: Visualization of 3D character armature, highlighting the specific keypoints used for coordinate extraction.
Figure 4: Overview of the proposed framework. After any chosen 3D HPE algorithm, our Graph Convolutional Network (GCN) refines the estimated 3D poses by integrating spatial and temporal insights, leading to enhanced and precise 3D pose estimation, particularly effective in handling occlusions.
Figure 5: Illustration of the graph dynamics for the right elbow keypoint, with neighboring nodes categorized into six classes: (1) Center (red). (2) Physically-connected node closer to the spine (blue). (3) Physically-connected farther from the spine (green). (4) Symmetric node (pink). (5) Time-forward node (orange). (6) Time-backward (yellow).
...and 5 more figures

3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement

TL;DR

Abstract

3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement

Authors

TL;DR

Abstract

Table of Contents

Figures (10)