Table of Contents
Fetching ...

Introspection in Learned Semantic Scene Graph Localisation

Manshika Charvi Bissessur, Efimia Panagiotaki, Daniele De Martini

TL;DR

This paper investigates how semantic cues influence localisation robustness by training a semantics-only, graph-based localisation model and performing thorough post-hoc introspection. It demonstrates that Integrated Gradients and Attention Weights provide reliable attributions for object-class importance, revealing a TF-IDF-like down-weighting of frequent classes and a bias toward distinctive landmarks. The methodology combines a GNN backbone with perturbation-based class-importance analyses and fidelity tests to yield explainable registration under challenging conditions. The findings highlight that semantically salient relations, rather than mere geometry or frequency, drive robust localisation, with practical implications for safety-critical robotics and interpretable SLAM systems. The work also outlines limitations of a purely semantic setup and suggests future integration of geometric cues and evaluation on diverse datasets to strengthen explanations in real-world deployments.

Abstract

This work investigates how semantics influence localisation performance and robustness in a learned self-supervised, contrastive semantic localisation framework. After training a localisation network on both original and perturbed maps, we conduct a thorough post-hoc introspection analysis to probe whether the model filters environmental noise and prioritises distinctive landmarks over routine clutter. We validate various interpretability methods and present a comparative reliability analysis. Integrated gradients and Attention Weights consistently emerge as the most reliable probes of learned behaviour. A semantic class ablation further reveals an implicit weighting in which frequent objects are often down-weighted. Overall, the results indicate that the model learns noise-robust, semantically salient relations about place definition, thereby enabling explainable registration under challenging visual and structural variations.

Introspection in Learned Semantic Scene Graph Localisation

TL;DR

This paper investigates how semantic cues influence localisation robustness by training a semantics-only, graph-based localisation model and performing thorough post-hoc introspection. It demonstrates that Integrated Gradients and Attention Weights provide reliable attributions for object-class importance, revealing a TF-IDF-like down-weighting of frequent classes and a bias toward distinctive landmarks. The methodology combines a GNN backbone with perturbation-based class-importance analyses and fidelity tests to yield explainable registration under challenging conditions. The findings highlight that semantically salient relations, rather than mere geometry or frequency, drive robust localisation, with practical implications for safety-critical robotics and interpretable SLAM systems. The work also outlines limitations of a purely semantic setup and suggests future integration of geometric cues and evaluation on diverse datasets to strengthen explanations in real-world deployments.

Abstract

This work investigates how semantics influence localisation performance and robustness in a learned self-supervised, contrastive semantic localisation framework. After training a localisation network on both original and perturbed maps, we conduct a thorough post-hoc introspection analysis to probe whether the model filters environmental noise and prioritises distinctive landmarks over routine clutter. We validate various interpretability methods and present a comparative reliability analysis. Integrated gradients and Attention Weights consistently emerge as the most reliable probes of learned behaviour. A semantic class ablation further reveals an implicit weighting in which frequent objects are often down-weighted. Overall, the results indicate that the model learns noise-robust, semantically salient relations about place definition, thereby enabling explainable registration under challenging visual and structural variations.

Paper Structure

This paper contains 19 sections, 2 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Floor-plan of the office scene. Each L2 and L3 nodes are plotted in its metric coordinate. L3 nodes are coloured by cosine similarity to a query.
  • Figure 2: Place--query matrices. Diagonal structure indicates correct matches; our model sharpens the diagonal and suppresses false positives.
  • Figure 3: Importance of each object node averaged over all place embeddings in the office scene, as determined by four explainability methods: Saliency, Integrated Gradients, Attention Weights, and Shapley Value Sampling. All methods highlight a small subset of highly influential objects, but notable discrepancies in mid-range scores reveal method-specific differences in assessing node relevance.
  • Figure 4: Normalised change in PR-AUC resulting from removal of each semantic class from the scene graph, measured against the change in node importance distribution obtained from attention weights. The positive correlation confirms that attention can serve as an effective attribution method.
  • Figure 5: Fidelity+ (a, necessity of removed nodes), fidelity- (b, sufficiency of kept nodes) and combined characterisation score (c) curves on Run 1. Generally, Integrated Gradients performs best overall, as indicated by its higher characterisation score, where Attention also exhibits a high initial slope, suggesting strong performance in identifying the most critical nodes.