Table of Contents
Fetching ...

What really matters for person re-identification? A Mixture-of-Experts Framework for Semantic Attribute Importance

Athena Psalta, Vasileios Tsironis, Konstantinos Karantzalos

TL;DR

<p>We address the interpretability gap in person re-identification by introducing MoSAIC-ReID, a Mixture-of-Experts framework where attribute-specific LoRA experts are gated by an oracle router to enable controlled, attribute-wise attribution. Integrated into a CLIP-based Transformer backbone as a residual MoE in the last layers, MoSAIC-ReID permits principled measurement of each semantic attribute's impact on re-ID accuracy while preserving core model capacity. Through GLM, RF-based permutation importance, SHAP values, and hypothesis testing on Market-1501 and DukeMTMC with rich attribute annotations, we find that clothing color—especially lower-body color—and intrinsic attributes are the most influential cues, whereas rare accessories contribute less. The framework provides a rigorous, interpretable methodology for incorporating explicit semantic knowledge into ReID and offers practical guidance for deploying attribute-informed systems across transformer-based backbones.

Abstract

State-of-the-art person re-identification methods achieve impressive accuracy but remain largely opaque, leaving open the question: which high-level semantic attributes do these models actually rely on? We propose MoSAIC-ReID, a Mixture-of-Experts framework that systematically quantifies the importance of pedestrian attributes for re-identification. Our approach uses LoRA-based experts, each linked to a single attribute, and an oracle router that enables controlled attribution analysis. While MoSAIC-ReID achieves competitive performance on Market-1501 and DukeMTMC under the assumption that attribute annotations are available at test time, its primary value lies in providing a large-scale, quantitative study of attribute importance across intrinsic and extrinsic cues. Using generalized linear models, statistical tests, and feature-importance analyses, we reveal which attributes, such as clothing colors and intrinsic characteristics, contribute most strongly, while infrequent cues (e.g. accessories) have limited effect. This work offers a principled framework for interpretable ReID and highlights the requirements for integrating explicit semantic knowledge in practice. Code is available at https://github.com/psaltaath/MoSAIC-ReID

What really matters for person re-identification? A Mixture-of-Experts Framework for Semantic Attribute Importance

TL;DR

<p>We address the interpretability gap in person re-identification by introducing MoSAIC-ReID, a Mixture-of-Experts framework where attribute-specific LoRA experts are gated by an oracle router to enable controlled, attribute-wise attribution. Integrated into a CLIP-based Transformer backbone as a residual MoE in the last layers, MoSAIC-ReID permits principled measurement of each semantic attribute's impact on re-ID accuracy while preserving core model capacity. Through GLM, RF-based permutation importance, SHAP values, and hypothesis testing on Market-1501 and DukeMTMC with rich attribute annotations, we find that clothing color—especially lower-body color—and intrinsic attributes are the most influential cues, whereas rare accessories contribute less. The framework provides a rigorous, interpretable methodology for incorporating explicit semantic knowledge into ReID and offers practical guidance for deploying attribute-informed systems across transformer-based backbones.

Abstract

State-of-the-art person re-identification methods achieve impressive accuracy but remain largely opaque, leaving open the question: which high-level semantic attributes do these models actually rely on? We propose MoSAIC-ReID, a Mixture-of-Experts framework that systematically quantifies the importance of pedestrian attributes for re-identification. Our approach uses LoRA-based experts, each linked to a single attribute, and an oracle router that enables controlled attribution analysis. While MoSAIC-ReID achieves competitive performance on Market-1501 and DukeMTMC under the assumption that attribute annotations are available at test time, its primary value lies in providing a large-scale, quantitative study of attribute importance across intrinsic and extrinsic cues. Using generalized linear models, statistical tests, and feature-importance analyses, we reveal which attributes, such as clothing colors and intrinsic characteristics, contribute most strongly, while infrequent cues (e.g. accessories) have limited effect. This work offers a principled framework for interpretable ReID and highlights the requirements for integrating explicit semantic knowledge in practice. Code is available at https://github.com/psaltaath/MoSAIC-ReID

Paper Structure

This paper contains 15 sections, 4 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: MoSAIC‑ReID framework. A CLIP‑based visual encoder is augmented in its last transformer layers with MoSAIC‑ReID modules composed of semantic LoRA experts grouped by attribute type and activated through an oracle router using ground‑truth annotations. The resulting attribute‑aware CLS token yields attribute‑informed ReID performance, which is subsequently analysed through generalized linear models, hypothesis testing and ML‑based feature importance to derive quantitative insights on semantic attribute significance for re-identification.
  • Figure 2: Overview of the MoSAIC-ReID architecture which can be integrated within a transformer-based visual encoder. LoRA experts are organized into semantic groups, each aligned with a specific attribute. An oracle router deterministically activates experts based on ground-truth attributes, enabling explicit attribute-aware representation learning. Expert outputs are aggregated with a pooling mechanism and combined via a residual connection, ensuring both the original and attribute-enhanced features contribute to the final embedding.
  • Figure 3: Expert group activation for different attribute types. Left: For single-state binary attributes, a single LoRA expert is activated only if the attribute exists. Middle: For dual-state binary attributes, one of two experts is activated based on the observed attribute state. Right: For multiclass attributes, exactly one expert in a group is activated according to the specific attribute category.
  • Figure 4: Prior probabilities for each value of the manually annotated attributes in the Market-1501 (marketmarket-attr) (left) and DukeMTMC (dukemtmcdukemtmc-attr) (right) datasets. For each dataset, the horizontal bars represent the distribution of attribute values across all annotated images, with color segments indicating the proportion of each category (e.g., gender, clothing type, color, accessories).
  • Figure 5: SHAP scores for feature importance on Market1501 (market) (left) and DukeMTMC (dukemtmc) (right) datasets.
  • ...and 4 more figures