UniParser: Multi-Human Parsing with Unified Correlation Representation Learning
Jiaming Chu, Lei Jin, Junliang Xing, Jian Zhao
TL;DR
UniParser tackles multi-human parsing by unifying instance-level and category-level representations within a cosine-space correlation framework. It introduces Center Locator, Instance Feature Space Builder, and Category Feature Space Builder to learn discriminative instance and category features, then fuses them in an end-to-end, NMS-free pipeline that outputs pixel-level parsing. The approach achieves state-of-the-art results on MHPv2.0 and CIHP (e.g., AP$^{p}_{50}$, AP$^{p}_{vol}$, PCP$_{50}$ improvements) while reducing inference time and parameter count. The work demonstrates the effectiveness of joint optimization in a unified representation space and underscores potential for applying correlation learning to other fine-grained, multi-instance vision tasks.
Abstract
Multi-human parsing is an image segmentation task necessitating both instance-level and fine-grained category-level information. However, prior research has typically processed these two types of information through separate branches and distinct output formats, leading to inefficient and redundant frameworks. This paper introduces UniParser, which integrates instance-level and category-level representations in three key aspects: 1) we propose a unified correlation representation learning approach, allowing our network to learn instance and category features within the cosine space; 2) we unify the form of outputs of each modules as pixel-level segmentation results while supervising instance and category features using a homogeneous label accompanied by an auxiliary loss; and 3) we design a joint optimization procedure to fuse instance and category representations. By virtual of unifying instance-level and category-level output, UniParser circumvents manually designed post-processing techniques and surpasses state-of-the-art methods, achieving 49.3% AP on MHPv2.0 and 60.4% AP on CIHP. We will release our source code, pretrained models, and online demos to facilitate future studies.
