Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation

Zejun Gu; Zhong-Qiu Zhao; Hao Shen; Zhao Zhang

Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation

Zejun Gu, Zhong-Qiu Zhao, Hao Shen, Zhao Zhang

TL;DR

A Multi-Granular Information-Lossless (MGIL) model is proposed to replace the downsampling layers to address the above issues and outperforms the SOTA methods by 7.7 mAP on COCO and performs well with different input resolutions, different backbones, and different vision tasks.

Abstract

In real-world applications of human pose estimation, low-resolution input images are frequently encountered when the performance of the image acquisition equipment is limited or the shooting distance is too far. However, existing state-of-the-art models for human pose estimation perform poorly on low-resolution images. One key reason is the presence of downsampling layers in these models, e.g., strided convolutions and pooling layers. It further reduces the already insufficient image information. Another key reason is that the body skeleton and human kinematic information are not fully utilized. In this work, we propose a Multi-Granular Information-Lossless (MGIL) model to replace the downsampling layers to address the above issues. Specifically, MGIL employs a Fine-grained Lossless Information Extraction (FLIE) module, which can prevent the loss of local information. Furthermore, we design a Coarse-grained Information Interaction (CII) module to adequately leverage human body structural information. To efficiently fuse cross-granular information and thoroughly exploit the relationships among keypoints, we further introduce a Multi-Granular Adaptive Fusion (MGAF) mechanism. The mechanism assigns weights to features of different granularities based on the content of the image. The model is effective, flexible, and universal. We show its potential in various vision tasks with comprehensive experiments. It outperforms the SOTA methods by 7.7 mAP on COCO and performs well with different input resolutions, different backbones, and different vision tasks. The code is provided in supplementary material.

Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation

TL;DR

Abstract

Paper Structure (17 sections, 11 equations, 5 figures, 8 tables)

This paper contains 17 sections, 11 equations, 5 figures, 8 tables.

Introduction
RELATED WORK
Human Pose Estimation
Low-resolution Vision Tasks
METHODS
Overall Framework
Fine-grained Lossless Information Extraction (FLIE) Module
Coarse-grained Information Interaction (CII) Module
Multi-Granular Adaptive Fusion (MGAF) Mechanism
EXPERIMENTS
Datasets and Metrics
Implementation Details
Results in Human Pose Estimation
Results on Other Computer Vision Tasks
Ablation Studies
...and 2 more sections

Figures (5)

Figure 1: Visualization of our proposed MGIL model with low-resolution inputs on the COCO val dataset.
Figure 2: The illustration of the Multi-Granular Information-Lossless (MGIL) Model. The comparison of traditional convolution (b) and dilated convolution (c).
Figure 3: The overall architecture of MGIL-HRNet and MGIL-ResNet.
Figure 4: (a) Overview of our proposed MGIL model. The illustration of SCT unit (b) and MGAF mechanism (c).
Figure 5: Visual results of our model under low-resolution conditions based on the COCO val dataset. From left to right, the resolutions are 128$\times$128, 128$\times$128, 128$\times$128, 64$\times$64, 64$\times$64, and 64$\times$64, respectively.

Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation

TL;DR

Abstract

Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)