CSCPR: Cross-Source-Context Indoor RGB-D Place Recognition

Jing Liang; Zhuo Deng; Zheming Zhou; Min Sun; Omid Ghasemalizadeh; Cheng-Hao Kuo; Arnie Sen; Dinesh Manocha

CSCPR: Cross-Source-Context Indoor RGB-D Place Recognition

Jing Liang, Zhuo Deng, Zheming Zhou, Min Sun, Omid Ghasemalizadeh, Cheng-Hao Kuo, Arnie Sen, Dinesh Manocha

TL;DR

This work tackles RGB-D indoor place recognition under cross-source conditions by introducing CSCPR, an end-to-end system that integrates global retrieval with a fast, learned reranking stage.The reranker leverages two novel modules, Self Context Clusters (SCC) and Cross Source Context Clusters (CSCC), to process multi-scale, multi-source RGB-D features within the Context-of-Clusters framework, producing a robust reranking score.The authors also contribute two large-scale, overlap-based datasets, ScanNetIPR and ARKitIPR, and demonstrate that CSCPR achieves substantial Recall@1 improvements over state-of-the-art methods across these datasets, along with improved efficiency.Together, these advances advance RGB-D indoor place recognition by enabling integrated retrieval-reranking with cross-source adaptability, while providing resources for ongoing research and development.

Abstract

We extend our previous work, PoCo, and present a new algorithm, Cross-Source-Context Place Recognition (CSCPR), for RGB-D indoor place recognition that integrates global retrieval and reranking into an end-to-end model and keeps the consistency of using Context-of-Clusters (CoCs) for feature processing. Unlike prior approaches that primarily focus on the RGB domain for place recognition reranking, CSCPR is designed to handle the RGB-D data. We apply the CoCs to handle cross-sourced and cross-scaled RGB-D point clouds and introduce two novel modules for reranking: the Self-Context Cluster (SCC) and the Cross Source Context Cluster (CSCC), which enhance feature representation and match query-database pairs based on local features, respectively. We also release two new datasets, ScanNetIPR and ARKitIPR. Our experiments demonstrate that CSCPR significantly outperforms state-of-the-art models on these datasets by at least 29.27% in Recall@1 on the ScanNet-PR dataset and 43.24% in the new datasets. Code and datasets will be released.

CSCPR: Cross-Source-Context Indoor RGB-D Place Recognition

TL;DR

Abstract

Paper Structure (15 sections, 6 equations, 9 figures, 4 tables, 1 algorithm)

This paper contains 15 sections, 6 equations, 9 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Method
Problem Definition
Architecture of CSCPR
Training
Dataset Generation
Experiment
Conclusion, Limitations, and Future Work
Supplement
Dataset Differences
Generated Datasets
Architecture Details
Failure Analysis
Modified R2Former Reranking

Figures (9)

Figure 1: Real-world Experiment: We propose a novel approach, Cross-Source-Context Place Recognition (CSCPR), for RGB-D indoor place recognition. Given a query frame, global retrieval ranks the potentially matched frames, and our novel place-recognition reranking model reranks the candidates to achieve better recognition accuracy.
Figure 2: Architecture of CSCPR: The blue box indicates Global Retrieval, and the yellow box represents Reranking. We use PoCo liang2024poco for global retrieval. After the database frames are ranked by comparing the global descriptors, $\mathbf{v}_q$ and $\mathbf{v}_d$, the reranking stage calculates the similarity of local features between query and database frames and reranks the database frames. The reranking model composes two Self Context Clusters (SCC) to process multi-scale features of each frame and a Cross Source Context Cluster (CSCC) to calculate the similarity of the local features between these two sorts of frames.
Figure 3: Furthest Positive Frame (FPF) of ScanNetPR vs. ScanNetIPR (ours): FPF depicts the least overlapping matched frame to the query in both datasets. For the same query frame (red), ScanNetPR using center distance to determine matched frames leads to erroneous matching with no overlapped areas. In ScanNetIPR, the overlap (green) is the only criterion for matching; thus it is more accurate for training and evaluating place recognition task.
Figure 4: Qualitative Comparisons: 1st and 3rd rows show RGB images corresponding to point clouds in 2nd and 4th rows. The red circles mark the overlapping areas between query frames (1st column) and later Recall@1 frames from different approaches. R2Former performs closest to our approach, but it does not perform well in different scaled frames. Our overall algorithm (CSCPR) balances the geometric and RGB information well and achieves the best performance, even for the scenarios that have very small overlapping areas.
Figure 5: Architecture Details: The details of the architecture of SCC and CSCC.
...and 4 more figures

CSCPR: Cross-Source-Context Indoor RGB-D Place Recognition

TL;DR

Abstract

CSCPR: Cross-Source-Context Indoor RGB-D Place Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (9)