Structure-Aware Residual-Center Representation for Self-Supervised Open-Set 3D Cross-Modal Retrieval
Yang Xu, Yifan Feng, Yu Jiang
TL;DR
The paper tackles open-set 3D cross-modal retrieval, where unseen categories undermine methods relying on training priors. It introduces Structure-Aware Residual-Center Representation (SRCR), combining Residual-Center Embedding (RCE) via nested auto-encoders and Hierarchical Structure Learning (HSL) via a heterogeneous hypergraph with memory-based alignment, enabling self-supervised generalization to unseen categories. Key contributions include (i) a residual-center strategy that decouples modality centers from category priors, (ii) a hierarchical hypergraph capturing inter-modality, intra-object, and implicit-category relations, and (iii) comprehensive experiments on four open-set benchmarks showing consistent improvements over state-of-the-art. The approach advances practical open-set 3D cross-modal retrieval by reducing center deviation and leveraging high-order correlations, with impact on robotics, medicine, and related fields where unseen categories are common.
Abstract
Existing methods of 3D cross-modal retrieval heavily lean on category distribution priors within the training set, which diminishes their efficacy when tasked with unseen categories under open-set environments. To tackle this problem, we propose the Structure-Aware Residual-Center Representation (SRCR) framework for self-supervised open-set 3D cross-modal retrieval. To address the center deviation due to category distribution differences, we utilize the Residual-Center Embedding (RCE) for each object by nested auto-encoders, rather than directly mapping them to the modality or category centers. Besides, we perform the Hierarchical Structure Learning (HSL) approach to leverage the high-order correlations among objects for generalization, by constructing a heterogeneous hypergraph structure based on hierarchical inter-modality, intra-object, and implicit-category correlations. Extensive experiments and ablation studies on four benchmarks demonstrate the superiority of our proposed framework compared to state-of-the-art methods.
