Body Segmentation Using Multi-task Learning

Julijan Jug; Ajda Lampe; Vitomir Štruc; Peter Peer

Body Segmentation Using Multi-task Learning

Julijan Jug, Ajda Lampe, Vitomir Štruc, Peter Peer

TL;DR

SPD is a multi-task architecture that enhances human body segmentation by jointly learning segmentation, skeleton keypoint estimation, and dense pose prediction on a shared ResNet-101 backbone. The model optimizes a unified loss $L = \\lambda_s L_s + \\lambda_p L_p + \\lambda_d L_d$ with empirically determined weights, and each branch provides contextual information to the segmentation head. Empirical results on the LIP and ATR datasets show SPD outperforms the JPPNet baseline, with ablations demonstrating that both pose and dense-pose tasks contribute to improved segmentation, especially under cross-dataset conditions. The work highlights the practical impact of incorporating anatomical and pose context into segmentation for robust human parsing in applications like virtual try-on and fashion analysis.

Abstract

Body segmentation is an important step in many computer vision problems involving human images and one of the key components that affects the performance of all downstream tasks. Several prior works have approached this problem using a multi-task model that exploits correlations between different tasks to improve segmentation performance. Based on the success of such solutions, we present in this paper a novel multi-task model for human segmentation/parsing that involves three tasks, i.e., (i) keypoint-based skeleton estimation, (ii) dense pose prediction, and (iii) human-body segmentation. The main idea behind the proposed Segmentation--Pose--DensePose model (or SPD for short) is to learn a better segmentation model by sharing knowledge across different, yet related tasks. SPD is based on a shared deep neural network backbone that branches off into three task-specific model heads and is learned using a multi-task optimization objective. The performance of the model is analysed through rigorous experiments on the LIP and ATR datasets and in comparison to a recent (state-of-the-art) multi-task body-segmentation model. Comprehensive ablation studies are also presented. Our experimental results show that the proposed multi-task (segmentation) model is highly competitive and that the introduction of additional tasks contributes towards a higher overall segmentation performance.

Body Segmentation Using Multi-task Learning

TL;DR

with empirically determined weights, and each branch provides contextual information to the segmentation head. Empirical results on the LIP and ATR datasets show SPD outperforms the JPPNet baseline, with ablations demonstrating that both pose and dense-pose tasks contribute to improved segmentation, especially under cross-dataset conditions. The work highlights the practical impact of incorporating anatomical and pose context into segmentation for robust human parsing in applications like virtual try-on and fashion analysis.

Abstract

Paper Structure (18 sections, 10 equations, 11 figures, 2 tables)

This paper contains 18 sections, 10 equations, 11 figures, 2 tables.

Introduction
Related work
Methodology
Model Overview
Segmentation Branch
Pose Estimation Branch
Dense Pose Branch
Training Details
Experiments and results
Datasets
Performance Measures
Segmentation Results and Ablations
Results of Auxiliary Tasks
Qualitative analysis
Conclusion
...and 3 more sections

Figures (11)

Figure 1: This example shows that the pose and dense pose subtasks provide helpful contextual and structural information about the human body. The second image shows the segmentation mask produced by our multi-task model containing segmentation and pose estimation tasks. The third image shows the segmentation mask created by our multi-task model with the additional task of dense pose estimation. We can see that the additional task of dense pose estimation significantly improves the segmentation performance.
Figure 2: High-level architectural diagram of the proposed SPD model. The common ResNet backbone of the SPD model is shared between three specialized model branches designed specifically for human body segmentation, skeleton/pose prediction, and dense pose estimation.
Figure 3: Overview of the segmentation branch of the SPD model. The branch consists of two parts, where the first generates an initial segmentation result based on features produced by the backbone model, whereas the second refines this initial estimate using different types of input information - also from other branches.
Figure 4: Overview of the pose estimation branch of the SPD model. The branch consists of two parts, where the first generates an initial keypoint prediction based on features produced by the backbone model, whereas the second refines this initial estimate using different types of input information - also from other branches.
Figure 5: The figure shows a high-level diagram of the architecture of the DensePose model. Dense pose components visualization is taken from a DensePose articledensepose1.
...and 6 more figures

Body Segmentation Using Multi-task Learning

TL;DR

Abstract

Body Segmentation Using Multi-task Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)