Structure-Aware Residual-Center Representation for Self-Supervised Open-Set 3D Cross-Modal Retrieval

Yang Xu; Yifan Feng; Yu Jiang

Structure-Aware Residual-Center Representation for Self-Supervised Open-Set 3D Cross-Modal Retrieval

Yang Xu, Yifan Feng, Yu Jiang

TL;DR

The paper tackles open-set 3D cross-modal retrieval, where unseen categories undermine methods relying on training priors. It introduces Structure-Aware Residual-Center Representation (SRCR), combining Residual-Center Embedding (RCE) via nested auto-encoders and Hierarchical Structure Learning (HSL) via a heterogeneous hypergraph with memory-based alignment, enabling self-supervised generalization to unseen categories. Key contributions include (i) a residual-center strategy that decouples modality centers from category priors, (ii) a hierarchical hypergraph capturing inter-modality, intra-object, and implicit-category relations, and (iii) comprehensive experiments on four open-set benchmarks showing consistent improvements over state-of-the-art. The approach advances practical open-set 3D cross-modal retrieval by reducing center deviation and leveraging high-order correlations, with impact on robotics, medicine, and related fields where unseen categories are common.

Abstract

Existing methods of 3D cross-modal retrieval heavily lean on category distribution priors within the training set, which diminishes their efficacy when tasked with unseen categories under open-set environments. To tackle this problem, we propose the Structure-Aware Residual-Center Representation (SRCR) framework for self-supervised open-set 3D cross-modal retrieval. To address the center deviation due to category distribution differences, we utilize the Residual-Center Embedding (RCE) for each object by nested auto-encoders, rather than directly mapping them to the modality or category centers. Besides, we perform the Hierarchical Structure Learning (HSL) approach to leverage the high-order correlations among objects for generalization, by constructing a heterogeneous hypergraph structure based on hierarchical inter-modality, intra-object, and implicit-category correlations. Extensive experiments and ablation studies on four benchmarks demonstrate the superiority of our proposed framework compared to state-of-the-art methods.

Structure-Aware Residual-Center Representation for Self-Supervised Open-Set 3D Cross-Modal Retrieval

TL;DR

Abstract

Paper Structure (19 sections, 10 equations, 3 figures, 3 tables)

This paper contains 19 sections, 10 equations, 3 figures, 3 tables.

Introduction
Related Work
Cross-Modal Retrieval
Open-Environment Learning
Methodology
Problem Setup
Framework Architecture
Residual-Center Embedding
Residual Learning
Loss Function for RCE
Hierarchical Structure Learning
Hierarchical Hypergraph Construction
Hypergraph Convolution and Alignment
Loss Function for HSL
Experiments
...and 4 more sections

Figures (3)

Figure 1: Illustration of the proposed SRCR. Given 3D objects of unseen categories represented by different modalities, our method generates the residual-center embeddings for each modality of each object. Then unified center representations are generated via hierarchical structure learning for cross-modal retrieval with unseen categories generalization.
Figure 2: An overview of the proposed structure-aware residual-center representation framework (SRCR). Our framework comprises two main modules: Residual-Center Embedding (RCE) and Hierarchical Structure Learning (HSL), which are used for residual embedding generation and structure-aware feature alignment, respectively.
Figure 3: The precision-recall curves comparison of Image2Point retrieval on four datasets, respectively.

Structure-Aware Residual-Center Representation for Self-Supervised Open-Set 3D Cross-Modal Retrieval

TL;DR

Abstract

Structure-Aware Residual-Center Representation for Self-Supervised Open-Set 3D Cross-Modal Retrieval

Authors

TL;DR

Abstract

Table of Contents

Figures (3)