Table of Contents
Fetching ...

Refinement Module based on Parse Graph for Human Pose Estimation

Shibang Liu, Xuemei Xie

TL;DR

RMPG adaptively refines feature maps through recursive top-down decomposition of feature maps and bottom-up composition of sub-node feature maps with context information, thereby improving the accuracy of joint inference and improving the accuracy of joint inference.

Abstract

Parse graphs have been widely used in Human Pose Estimation (HPE) to model the hierarchical structure and context relations of the human body. However, such methods often suffer from parameter redundancy. More importantly, they rely on predefined network structures, which limits their use in other methods. To address these issues, we propose a new context relation and hierarchical structure modeling module, RMPG (Refinement Module based on Parse Graph). RMPG adaptively refines feature maps through recursive top-down decomposition of feature maps and bottom-up composition of sub-node feature maps with context information. Through recursive hierarchical composition, RMPG fuses local details and global semantics into more structured feature representations, accompanied by context information, thereby improving the accuracy of joint inference. RMPG can be flexibly embedded as a plug-in into various mainstream HPE networks. Moreover, by supervising sub-node features map, RMPG learns the context relations and hierarchical structure between different body parts with fewer parameters. Extensive experiments show that RMPG improves performance across different architectures while effectively modeling hierarchical and context relations of the human body with fewer parameters. The RMPG code can be found at https://github.com/lushbng/RMPG.

Refinement Module based on Parse Graph for Human Pose Estimation

TL;DR

RMPG adaptively refines feature maps through recursive top-down decomposition of feature maps and bottom-up composition of sub-node feature maps with context information, thereby improving the accuracy of joint inference and improving the accuracy of joint inference.

Abstract

Parse graphs have been widely used in Human Pose Estimation (HPE) to model the hierarchical structure and context relations of the human body. However, such methods often suffer from parameter redundancy. More importantly, they rely on predefined network structures, which limits their use in other methods. To address these issues, we propose a new context relation and hierarchical structure modeling module, RMPG (Refinement Module based on Parse Graph). RMPG adaptively refines feature maps through recursive top-down decomposition of feature maps and bottom-up composition of sub-node feature maps with context information. Through recursive hierarchical composition, RMPG fuses local details and global semantics into more structured feature representations, accompanied by context information, thereby improving the accuracy of joint inference. RMPG can be flexibly embedded as a plug-in into various mainstream HPE networks. Moreover, by supervising sub-node features map, RMPG learns the context relations and hierarchical structure between different body parts with fewer parameters. Extensive experiments show that RMPG improves performance across different architectures while effectively modeling hierarchical and context relations of the human body with fewer parameters. The RMPG code can be found at https://github.com/lushbng/RMPG.
Paper Structure (14 sections, 12 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 14 sections, 12 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: (a) The overview of RMPG. Dashed lines indicate context relations among sub-feature maps. (b) The parse graph of body structure, from PGBS. The human body is partitioned into five parts (limbs and torso) and structured into three hierarchical levels (body, parts, joints). Black dashed lines indicate context relations between sub-structures.
  • Figure 2: The example of RMPG and its expansion. RMPG supports two types of structural expansion: breadth expansion adds more child nodes to a parent at the same level, increasing the number of nodes in that layer, and depth expansion, which increases the number of hierarchical levels.
  • Figure 3: (a) The backbone network extracts the initial feature $F_0$. (b) The feature $F_0$ is supervised by body heatmaps (Note: Results of dimension-processed feature maps are supervised; the same applies below). Two supervised RMPG$_\text{s}$ (red dashed boxes indicate internal supervision in RMPG) model the context relations between body parts and those between joints respectively. Finally, the unsupervised gray RMPG$_\text{u}$ refines the feature map $F_2$. (C) In RMPG, supervision for body parts and joints occurs at the final composition stage, differing only in the specific part or joint labels used.
  • Figure 4: The relationship between depth in RMPG with the input size of $L\times C$ ($L=64\times48,C=256$) and the two factors of parameter count, computational complexity. The settings of $\mathcal{G}$ are sequentially $[g_d,\cdots,g_1]$, where $g_i=2 ,i\in\{1,\cdots,d\}$ and $d=1,2,\cdots,7$, corresponding to depth on the horizontal axis. $\parallel$ means the spatial decomposition of RMPG, and the absence of $\parallel$ means channel decomposition.
  • Figure 5: The relationship between the numbers of nodes (breadth) in RMPG with the input size of $L\times C$ ($L=64\times48,C=256$) and the two factors of parameter count, computational complexity. $\mathcal{G}$ is set $[2^n,2]$ for $n=1,2,\dots,7$, corresponding to nodes on the horizontal axis. $\parallel$ means the spatial decomposition of RMPG, and the absence of $\parallel$ means channel decomposition.
  • ...and 2 more figures