Table of Contents
Fetching ...

Greit-HRNet: Grouped Lightweight High-Resolution Network for Human Pose Estimation

Junjia Han

TL;DR

Greit-HRNet addresses the need for efficient high-resolution pose estimation by introducing grouped channel weighting (GCW) and global spatial weighting (GSW) to maintain weight stability across stages and enhance global spatial information exchange. It further leverages a Large Kernel Stem with Large Kernel Attention (LKA) to enlarge receptive fields without a prohibitive increase in parameters. The method achieves strong pose-estimation performance on MS-COCO and MPII, surpassing other lightweight networks while maintaining substantially lower complexity, and demonstrates clear gains in ablation studies. Overall, Greit-HRNet offers a practical, scalable solution for accurate real-time human pose estimation in resource-constrained scenarios.

Abstract

As multi-scale features are necessary for human pose estimation tasks, high-resolution networks are widely applied. To improve efficiency, lightweight modules are proposed to replace costly point-wise convolutions in high-resolution networks, including channel weighting and spatial weighting methods. However, they fail to maintain the consistency of weights and capture global spatial information. To address these problems, we present a Grouped lightweight High-Resolution Network (Greit-HRNet), in which we propose a Greit block including a group method Grouped Channel Weighting (GCW) and a spatial weighting method Global Spatial Weighting (GSW). GCW modules group conditional channel weighting to make weights stable and maintain the high-resolution features with the deepening of the network, while GSW modules effectively extract global spatial information and exchange information across channels. In addition, we apply the Large Kernel Attention (LKA) method to improve the whole efficiency of our Greit-HRNet. Our experiments on both MS-COCO and MPII human pose estimation datasets demonstrate the superior performance of our Greit-HRNet, outperforming other state-of-the-art lightweight networks.

Greit-HRNet: Grouped Lightweight High-Resolution Network for Human Pose Estimation

TL;DR

Greit-HRNet addresses the need for efficient high-resolution pose estimation by introducing grouped channel weighting (GCW) and global spatial weighting (GSW) to maintain weight stability across stages and enhance global spatial information exchange. It further leverages a Large Kernel Stem with Large Kernel Attention (LKA) to enlarge receptive fields without a prohibitive increase in parameters. The method achieves strong pose-estimation performance on MS-COCO and MPII, surpassing other lightweight networks while maintaining substantially lower complexity, and demonstrates clear gains in ablation studies. Overall, Greit-HRNet offers a practical, scalable solution for accurate real-time human pose estimation in resource-constrained scenarios.

Abstract

As multi-scale features are necessary for human pose estimation tasks, high-resolution networks are widely applied. To improve efficiency, lightweight modules are proposed to replace costly point-wise convolutions in high-resolution networks, including channel weighting and spatial weighting methods. However, they fail to maintain the consistency of weights and capture global spatial information. To address these problems, we present a Grouped lightweight High-Resolution Network (Greit-HRNet), in which we propose a Greit block including a group method Grouped Channel Weighting (GCW) and a spatial weighting method Global Spatial Weighting (GSW). GCW modules group conditional channel weighting to make weights stable and maintain the high-resolution features with the deepening of the network, while GSW modules effectively extract global spatial information and exchange information across channels. In addition, we apply the Large Kernel Attention (LKA) method to improve the whole efficiency of our Greit-HRNet. Our experiments on both MS-COCO and MPII human pose estimation datasets demonstrate the superior performance of our Greit-HRNet, outperforming other state-of-the-art lightweight networks.
Paper Structure (36 sections, 6 equations, 4 figures, 6 tables)

This paper contains 36 sections, 6 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: The architecture of Greit-HRNet. The stages go deep in the horizontal direction and the branches expand in the vertical direction.
  • Figure 2: The LKS block and Greit block. An LKS block applies two LKA modules guo2023visual separately. A Greit block corresponds to two branches, if any. The two branches share a GCW module while having a GSW module each. Thus, a Greit block contains a GCW module and two GSW modules (if any).
  • Figure 3: The transition of feature maps with the deepening of stages. For Conditional channel weighting, the number of channels increases rapidly, while the number for our Grouped channel weighting remains stable.
  • Figure 4: Global spatial information extraction and example qualitative results. $\otimes$ denotes matrix multiplication. The first convolution sets the number of channels in the feature graph to a hyper-parameter $C'$, and the second convolution reduces this number to 1.