Pedestrian Attribute Recognition as Label-balanced Multi-label Learning
Yibo Zhou, Hai-Miao Hu, Yirong Xiang, Xiaokang Zhang, Haotian Wu
TL;DR
This work addresses the pervasive label and semantics imbalances in Pedestrian Attribute Recognition (PAR) by proposing a label-balanced, feature-space re-sampling framework (FRDL) coupled with gradient-driven semantic augmentation (GOAT). FRDL decouples attribute sampling from co-occurring others by training a fixed feature extractor and a per-attribute, label-balanced classifier calibrated via balanced feature banks, thereby overcoming image-space co-occurrence constraints. GOAT complements FRDL by introducing heterogeneous, in-distribution feature augmentations through gradient-driven translations, which behave like Bayesian sampling and reduce feature noise, improving generalization. Empirically, FRDL+GOAT achieves state-of-the-art mean accuracy on PA100k and RAPv1, with robust gains on PETA after addressing data leakage, and does so with minimal extra parameters and computational burden, making it a practical, plug-in approach for real-world multi-label recognition tasks.
Abstract
Rooting in the scarcity of most attributes, realistic pedestrian attribute datasets exhibit unduly skewed data distribution, from which two types of model failures are delivered: (1) label imbalance: model predictions lean greatly towards the side of majority labels; (2) semantics imbalance: model is easily overfitted on the under-represented attributes due to their insufficient semantic diversity. To render perfect label balancing, we propose a novel framework that successfully decouples label-balanced data re-sampling from the curse of attributes co-occurrence, i.e., we equalize the sampling prior of an attribute while not biasing that of the co-occurred others. To diversify the attributes semantics and mitigate the feature noise, we propose a Bayesian feature augmentation method to introduce true in-distribution novelty. Handling both imbalances jointly, our work achieves best accuracy on various popular benchmarks, and importantly, with minimal computational budget.
