Hierarchical place recognition with omnidirectional images and curriculum learning-based loss functions

Marcos Alfaro; Juan José Cabrera; María Flores; Óscar Reinoso; Luis Payá

Hierarchical place recognition with omnidirectional images and curriculum learning-based loss functions

Marcos Alfaro, Juan José Cabrera, María Flores, Óscar Reinoso, Luis Payá

TL;DR

This work tackles Visual Place Recognition (VPR) under challenging, real-world conditions by combining omnidirectional panoramic imagery with a hierarchical coarse-to-fine localization pipeline. It introduces curriculum-learning–based triplet losses that progressively increase training difficulty, yielding more discriminative embeddings for both room-level retrieval and intra-room positioning. Across indoor and outdoor datasets, the proposed losses outperform standard triplet losses, demonstrate robustness to illumination changes, noise, occlusions, and motion blur, and achieve strong generalization with limited training data. The approach offers a practical, efficient solution for real-world robotic localization and provides public code to facilitate adoption and further research.

Abstract

This paper addresses Visual Place Recognition (VPR), which is essential for the safe navigation of mobile robots. The solution we propose employs panoramic images and deep learning models, which are fine-tuned with triplet loss functions that integrate curriculum learning strategies. By progressively presenting more challenging examples during training, these loss functions enable the model to learn more discriminative and robust feature representations, overcoming the limitations of conventional contrastive loss functions. After training, VPR is tackled in two steps: coarse (room retrieval) and fine (position estimation). The results demonstrate that the curriculum-based triplet losses consistently outperform standard contrastive loss functions, particularly under challenging perceptual conditions. To thoroughly assess the robustness and generalization capabilities of the proposed method, it is evaluated in a variety of indoor and outdoor environments. The approach is tested against common challenges in real operation conditions, including severe illumination changes, the presence of dynamic visual effects such as noise and occlusions, and scenarios with limited training data. The results show that the proposed framework performs competitively in all these situations, achieving high recognition accuracy and demonstrating its potential as a reliable solution for real-world robotic applications. The code used in the experiments is available at https://github.com/MarcosAlfaro/TripletNetworksIndoorLocalization.git.

Hierarchical place recognition with omnidirectional images and curriculum learning-based loss functions

TL;DR

Abstract

Paper Structure (19 sections, 1 equation, 10 figures, 7 tables)

This paper contains 19 sections, 1 equation, 10 figures, 7 tables.

Introduction
State of the art
Methodology
Hierarchical localization
Stage 1: Coarse Localization (Room Retrieval)
Stage 2: Fine Localization (Intra-Room Positioning)
Backbone selection
Triplet loss functions
Experiments
Datasets
Indoor dataset: COLD
Mixed indoor-outdoor dataset: 360Loc
Experiment 1. Evaluation of loss functions
Experiment 2. Analysis of the robustness against dynamic effects
Noise effect
...and 4 more sections

Figures (10)

Figure 1: Hierarchical localization process performed in two steps: (a) coarse localization (room retrieval); (b) fine localization (estimating the robot coordinates inside the retrieved room(s)).
Figure 2: Sample images from the COLD database pronobis2009. The top row shows different lighting conditions: (a) Cloudy, (b) Night, and (c) Sunny. The bottom row presents different environments: (d) FR-A, (e) SA-A, and (f) SA-B.
Figure 3: Sample images from the 360Loc database huang2024, captured in the (a, b) atrium and (c, d) hall environments under (a, c) day and (b, d) night conditions.
Figure 4: Performance in (a) coarse localization and (b) fine localization for different training set sizes.
Figure 5: Qualitative VPR results. The left column shows coarse localization (room retrieval) for (a) Cloudy, (c) Night, and (e) Sunny conditions. The right column shows fine localization (position retrieval) for (b) Cloudy, (d) Night, and (f) Sunny conditions.
...and 5 more figures

Hierarchical place recognition with omnidirectional images and curriculum learning-based loss functions

TL;DR

Abstract

Hierarchical place recognition with omnidirectional images and curriculum learning-based loss functions

Authors

TL;DR

Abstract

Table of Contents

Figures (10)