A Quantitative Evaluation of the Expressivity of BMI, Pose and Gender in Body Embeddings for Recognition and Identification
Basudha Pal, Siyuan Huang, Rama Chellappa
TL;DR
This work investigates how body attributes are encoded in vision-language-based person ReID representations by extending the notion of expressivity to the ReID domain and quantifying attribute information via Mutual Information Neural Estimation (MINE). The authors apply this MI-based expressivity framework to ViT-based ReID models (e.g., SemReID, PFD, DC-Former) using an augmented feature–attribute input and a neural estimator to measure I_theta(F,A) across layers and training epochs. Key findings show that BMI consistently exhibits the highest expressivity, especially in deeper layers, while yaw and pitch are more prominent in mid-layers and tend to be suppressed with training; gender remains moderately entangled but relatively stable. The work provides a principled, post-hoc explanation tool for attribute-driven correlations in ReID, with practical implications for fairness and robustness in open-set deployment, while acknowledging MI-based estimates may be influenced by attribute entropy.
Abstract
Person Re-identification (ReID) systems that match individuals across images or video frames are essential in many real-world applications. However, existing methods are often influenced by attributes such as gender, pose, and body mass index (BMI), which vary in unconstrained settings and raise concerns related to fairness and generalization. To address this, we extend the notion of expressivity, defined as the mutual information between learned features and specific attributes, using a secondary neural network to quantify how strongly attributes are encoded. Applying this framework to three ReID models, we find that BMI consistently shows the highest expressivity in the final layers, indicating its dominant role in recognition. In the last attention layer, attributes are ranked as BMI > Pitch > Gender > Yaw, revealing their relative influences in representation learning. Expressivity values also evolve across layers and training epochs, reflecting a dynamic encoding of attributes. These findings demonstrate the central role of body attributes in ReID and establish a principled approach for uncovering attribute driven correlations.
