Table of Contents
Fetching ...

Graph-Boosted Attentive Network for Semantic Body Parsing

Tinghuai Wang, Huiling Wang

TL;DR

The paper addresses the challenge of fine-grained semantic body parsing in unconstrained multi-person scenes with occlusions and inter-part confusions. It introduces a CNN with semantic and contour attention branches and couples it to a pose-aware graphical model that integrates multi-scale pose context via geodesic distances on superpixels, solved through a convex optimization framework that jointly updates pixel and superpixel likelihoods. On the Pascal Person-Part dataset, the approach achieves state-of-the-art mean IoU ($68.55\%$), surpassing strong top-down and bottom-up baselines and demonstrating the value of combining higher-level pose configuration with local semantic cues. This framework improves boundary localization and disambiguates body parts under challenging poses and occlusions, offering a robust solution for real-world multi-person parsing tasks.

Abstract

Human body parsing remains a challenging problem in natural scenes due to multi-instance and inter-part semantic confusions as well as occlusions. This paper proposes a novel approach to decomposing multiple human bodies into semantic part regions in unconstrained environments. Specifically we propose a convolutional neural network (CNN) architecture which comprises of novel semantic and contour attention mechanisms across feature hierarchy to resolve the semantic ambiguities and boundary localization issues related to semantic body parsing. We further propose to encode estimated pose as higher-level contextual information which is combined with local semantic cues in a novel graphical model in a principled manner. In this proposed model, the lower-level semantic cues can be recursively updated by propagating higher-level contextual information from estimated pose and vice versa across the graph, so as to alleviate erroneous pose information and pixel level predictions. We further propose an optimization technique to efficiently derive the solutions. Our proposed method achieves the state-of-art results on the challenging Pascal Person-Part dataset.

Graph-Boosted Attentive Network for Semantic Body Parsing

TL;DR

The paper addresses the challenge of fine-grained semantic body parsing in unconstrained multi-person scenes with occlusions and inter-part confusions. It introduces a CNN with semantic and contour attention branches and couples it to a pose-aware graphical model that integrates multi-scale pose context via geodesic distances on superpixels, solved through a convex optimization framework that jointly updates pixel and superpixel likelihoods. On the Pascal Person-Part dataset, the approach achieves state-of-the-art mean IoU (), surpassing strong top-down and bottom-up baselines and demonstrating the value of combining higher-level pose configuration with local semantic cues. This framework improves boundary localization and disambiguates body parts under challenging poses and occlusions, offering a robust solution for real-world multi-person parsing tasks.

Abstract

Human body parsing remains a challenging problem in natural scenes due to multi-instance and inter-part semantic confusions as well as occlusions. This paper proposes a novel approach to decomposing multiple human bodies into semantic part regions in unconstrained environments. Specifically we propose a convolutional neural network (CNN) architecture which comprises of novel semantic and contour attention mechanisms across feature hierarchy to resolve the semantic ambiguities and boundary localization issues related to semantic body parsing. We further propose to encode estimated pose as higher-level contextual information which is combined with local semantic cues in a novel graphical model in a principled manner. In this proposed model, the lower-level semantic cues can be recursively updated by propagating higher-level contextual information from estimated pose and vice versa across the graph, so as to alleviate erroneous pose information and pixel level predictions. We further propose an optimization technique to efficiently derive the solutions. Our proposed method achieves the state-of-art results on the challenging Pascal Person-Part dataset.
Paper Structure (13 sections, 19 equations, 3 figures, 1 table)

This paper contains 13 sections, 19 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Illustration of the proposed network architecture.
  • Figure 2: Illustration of the proposed semantic attention module (left) and contour attention module (right).
  • Figure 3: Qualitative results of our algorithm on Pascal Person-Part dataset.