Occlusion Handling in 3D Human Pose Estimation with Perturbed Positional Encoding
Niloofar Azizi, Mohsen Fayyaz, Horst Bischof
TL;DR
This work tackles occlusion in 3D human pose estimation by introducing PerturbPE, a perturbation-based Laplacian eigenvector positional encoding that remains robust when input edges are missing. By applying Rayleigh-Schrödinger Perturbation Theory to repeatedly perturb the graph and averaging the resulting eigenvectors, PerturbPE isolates the consistent, regular part of the eigenbasis and integrates it into a MöbiusGCN backbone without adding parameters. The method achieves notable improvements on Human3.6M (MPJPE down to 32.7 mm from 34.1 mm) and strong generalization on MPI-INF-3DHP (3D PCK up to 84.0 in outdoor settings), particularly under scenarios with up to two missing edges. Overall, PerturbPE enables a single network to handle multiple occlusion patterns more robustly, advancing GCN expressivity under incomplete-graph conditions and offering practical benefits for real-world pose estimation.
Abstract
Understanding human behavior fundamentally relies on accurate 3D human pose estimation. Graph Convolutional Networks (GCNs) have recently shown promising advancements, delivering state-of-the-art performance with rather lightweight architectures. In the context of graph-structured data, leveraging the eigenvectors of the graph Laplacian matrix for positional encoding is effective. Yet, the approach does not specify how to handle scenarios where edges in the input graph are missing. To this end, we propose a novel positional encoding technique, PerturbPE, that extracts consistent and regular components from the eigenbasis. Our method involves applying multiple perturbations and taking their average to extract the consistent and regular component from the eigenbasis. PerturbPE leverages the Rayleigh-Schrodinger Perturbation Theorem (RSPT) for calculating the perturbed eigenvectors. Employing this labeling technique enhances the robustness and generalizability of the model. Our results support our theoretical findings, e.g. our experimental analysis observed a performance enhancement of up to $12\%$ on the Human3.6M dataset in instances where occlusion resulted in the absence of one edge. Furthermore, our novel approach significantly enhances performance in scenarios where two edges are missing, setting a new benchmark for state-of-the-art.
