ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model
Jialong Zuo, Yongtai Deng, Mengdan Tan, Rui Jin, Dongyue Wu, Nong Sang, Liang Pan, Changxin Gao
TL;DR
Omni Multi-modal Person Re-identification (OM-ReID) addresses retrieving a person using queries from any single modality or arbitrary modality combinations. The authors introduce ORBench, a high-quality five-modality dataset (RGB, infrared, color pencil, sketch, and text), and ReID5o, a unified framework with a multi-modal tokenizing assembler, a multi-expert router with modality-specific adapters, and a feature mixture that enables cross-modal alignment via SDM IRRA and identity losses. Extensive experiments show that multi-modal queries significantly improve retrieval performance and that ReID5o achieves state-of-the-art results across all modality combinations, validating both dataset quality and methodological effectiveness. This work establishes a solid foundation for OM-ReID research and provides public dataset and code to catalyze further exploration in multi-modal person ReID.
Abstract
In real-word scenarios, person re-identification (ReID) expects to identify a person-of-interest via the descriptive query, regardless of whether the query is a single modality or a combination of multiple modalities. However, existing methods and datasets remain constrained to limited modalities, failing to meet this requirement. Therefore, we investigate a new challenging problem called Omni Multi-modal Person Re-identification (OM-ReID), which aims to achieve effective retrieval with varying multi-modal queries. To address dataset scarcity, we construct ORBench, the first high-quality multi-modal dataset comprising 1,000 unique identities across five modalities: RGB, infrared, color pencil, sketch, and textual description. This dataset also has significant superiority in terms of diversity, such as the painting perspectives and textual information. It could serve as an ideal platform for follow-up investigations in OM-ReID. Moreover, we propose ReID5o, a novel multi-modal learning framework for person ReID. It enables synergistic fusion and cross-modal alignment of arbitrary modality combinations in a single model, with a unified encoding and multi-expert routing mechanism proposed. Extensive experiments verify the advancement and practicality of our ORBench. A wide range of possible models have been evaluated and compared on it, and our proposed ReID5o model gives the best performance. The dataset and code will be made publicly available at https://github.com/Zplusdragon/ReID5o_ORBench.
