Table of Contents
Fetching ...

Pedestrian Attribute Recognition as Label-balanced Multi-label Learning

Yibo Zhou, Hai-Miao Hu, Yirong Xiang, Xiaokang Zhang, Haotian Wu

TL;DR

This work addresses the pervasive label and semantics imbalances in Pedestrian Attribute Recognition (PAR) by proposing a label-balanced, feature-space re-sampling framework (FRDL) coupled with gradient-driven semantic augmentation (GOAT). FRDL decouples attribute sampling from co-occurring others by training a fixed feature extractor and a per-attribute, label-balanced classifier calibrated via balanced feature banks, thereby overcoming image-space co-occurrence constraints. GOAT complements FRDL by introducing heterogeneous, in-distribution feature augmentations through gradient-driven translations, which behave like Bayesian sampling and reduce feature noise, improving generalization. Empirically, FRDL+GOAT achieves state-of-the-art mean accuracy on PA100k and RAPv1, with robust gains on PETA after addressing data leakage, and does so with minimal extra parameters and computational burden, making it a practical, plug-in approach for real-world multi-label recognition tasks.

Abstract

Rooting in the scarcity of most attributes, realistic pedestrian attribute datasets exhibit unduly skewed data distribution, from which two types of model failures are delivered: (1) label imbalance: model predictions lean greatly towards the side of majority labels; (2) semantics imbalance: model is easily overfitted on the under-represented attributes due to their insufficient semantic diversity. To render perfect label balancing, we propose a novel framework that successfully decouples label-balanced data re-sampling from the curse of attributes co-occurrence, i.e., we equalize the sampling prior of an attribute while not biasing that of the co-occurred others. To diversify the attributes semantics and mitigate the feature noise, we propose a Bayesian feature augmentation method to introduce true in-distribution novelty. Handling both imbalances jointly, our work achieves best accuracy on various popular benchmarks, and importantly, with minimal computational budget.

Pedestrian Attribute Recognition as Label-balanced Multi-label Learning

TL;DR

This work addresses the pervasive label and semantics imbalances in Pedestrian Attribute Recognition (PAR) by proposing a label-balanced, feature-space re-sampling framework (FRDL) coupled with gradient-driven semantic augmentation (GOAT). FRDL decouples attribute sampling from co-occurring others by training a fixed feature extractor and a per-attribute, label-balanced classifier calibrated via balanced feature banks, thereby overcoming image-space co-occurrence constraints. GOAT complements FRDL by introducing heterogeneous, in-distribution feature augmentations through gradient-driven translations, which behave like Bayesian sampling and reduce feature noise, improving generalization. Empirically, FRDL+GOAT achieves state-of-the-art mean accuracy on PA100k and RAPv1, with robust gains on PETA after addressing data leakage, and does so with minimal extra parameters and computational burden, making it a practical, plug-in approach for real-world multi-label recognition tasks.

Abstract

Rooting in the scarcity of most attributes, realistic pedestrian attribute datasets exhibit unduly skewed data distribution, from which two types of model failures are delivered: (1) label imbalance: model predictions lean greatly towards the side of majority labels; (2) semantics imbalance: model is easily overfitted on the under-represented attributes due to their insufficient semantic diversity. To render perfect label balancing, we propose a novel framework that successfully decouples label-balanced data re-sampling from the curse of attributes co-occurrence, i.e., we equalize the sampling prior of an attribute while not biasing that of the co-occurred others. To diversify the attributes semantics and mitigate the feature noise, we propose a Bayesian feature augmentation method to introduce true in-distribution novelty. Handling both imbalances jointly, our work achieves best accuracy on various popular benchmarks, and importantly, with minimal computational budget.
Paper Structure (24 sections, 1 theorem, 9 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 24 sections, 1 theorem, 9 equations, 6 figures, 5 tables, 1 algorithm.

Key Result

Proposition 3.1

Eq.augloss is upper bounded by the optimum feature de-noising BCE loss: where $\sigma^k$ represents the feature noise rate of attribute $k$.

Figures (6)

  • Figure 1: The dominance of negative labels in PAR datasets, and mean accuracy as a function of the label mean. In PETA 2014Pedestrian, 66% attributes occur with a frequency under 0.1, while that for RAP li2016richly is 57%. Also, label imbalance is the main performance bottleneck of contemporary PAR model, as it is significantly brittle to attributes with label mean $\le$ 0.1.
  • Figure 2: Schematic presentation of the main idea of FRDL. Although DL can not be naively implemented for PAR due to the unsatisfiable label-balanced image re-sampling (Eq.\ref{['condition']}), its better form of FRDL is workable and thus acts as a better drop-in substitution of LIR.
  • Figure 3: The boost of attributes mA of our method w.r.t the baseline for PA100k. Attributes are placed in a decreasing order of the absolute difference between their label mean and 0.5 (red-dashed line), which can quantitatively measure their label imbalance.
  • Figure 4: (a): We contrast the representation quality of backbones learned with various label-balancing ratio $\gamma$ on PA100k. From $\gamma$ of 0 to 1, feature extractor transitions from that of a plain baseline to that of the loss re-weighted model. The y-axis shows the mA of a classifier re-trained on each of these converged feature extractors, by which we discern between the feature quality of them. $rS$ denotes that the classifier is re-trained by the Stage#2 of FRDL, while $rW$ represents the classifier is trained by loss re-weighting. (b): The mA of different feature extractor + classifier combinations on PA100k. For PAR, it manifests that: (1) label balancing obstructs a performant feature extractor; (2) label-balanced feature re-sampling produces fairly better classifier; (3) LIR is supposed to lie between 84.42 - 87.77% mA, falling behind FRDL (88.53%).
  • Figure 5: Proof-of-concept experiments for GOAT. T-SNE map of the augmented features distribution for (a) ISDA and (b) GOAT ($\times$ denotes the original feature). (c): Posterior variation along the rays corresponding to different eigenvectors of GOAT translation covariance matrix. (d): For PA100k, the distributions of the PDF scores for the features from test data, ISDA and GOAT augmentations, respectively.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Proposition 3.1