Body Segmentation Using Multi-task Learning
Julijan Jug, Ajda Lampe, Vitomir Štruc, Peter Peer
TL;DR
SPD is a multi-task architecture that enhances human body segmentation by jointly learning segmentation, skeleton keypoint estimation, and dense pose prediction on a shared ResNet-101 backbone. The model optimizes a unified loss $L = \\lambda_s L_s + \\lambda_p L_p + \\lambda_d L_d$ with empirically determined weights, and each branch provides contextual information to the segmentation head. Empirical results on the LIP and ATR datasets show SPD outperforms the JPPNet baseline, with ablations demonstrating that both pose and dense-pose tasks contribute to improved segmentation, especially under cross-dataset conditions. The work highlights the practical impact of incorporating anatomical and pose context into segmentation for robust human parsing in applications like virtual try-on and fashion analysis.
Abstract
Body segmentation is an important step in many computer vision problems involving human images and one of the key components that affects the performance of all downstream tasks. Several prior works have approached this problem using a multi-task model that exploits correlations between different tasks to improve segmentation performance. Based on the success of such solutions, we present in this paper a novel multi-task model for human segmentation/parsing that involves three tasks, i.e., (i) keypoint-based skeleton estimation, (ii) dense pose prediction, and (iii) human-body segmentation. The main idea behind the proposed Segmentation--Pose--DensePose model (or SPD for short) is to learn a better segmentation model by sharing knowledge across different, yet related tasks. SPD is based on a shared deep neural network backbone that branches off into three task-specific model heads and is learned using a multi-task optimization objective. The performance of the model is analysed through rigorous experiments on the LIP and ATR datasets and in comparison to a recent (state-of-the-art) multi-task body-segmentation model. Comprehensive ablation studies are also presented. Our experimental results show that the proposed multi-task (segmentation) model is highly competitive and that the introduction of additional tasks contributes towards a higher overall segmentation performance.
