Image Patch-Matching with Graph-Based Learning in Street Scenes
Rui She, Qiyu Kang, Sijie Wang, Wee Peng Tay, Yong Liang Guan, Diego Navarro Navarro, Andreas Hartmannsgruber
TL;DR
This paper addresses the challenge of robust landmark patch matching in street scenes by exploiting spatial relationships among patches. It introduces VGIDM, a graph-based framework that constructs neighborhood graphs for patches, learns joint vertex and graph embeddings via a ResNet encoder and a GNN, and uses a learnable discriminator to maximize information distance between matched and unmatched patch pairs. The authors provide theoretical grounding linking the learning objective to information-theoretic distances and demonstrate state-of-the-art results on two new landmark patch datasets derived from KITTI and Oxford RobotCar, with strong cross-dataset generalization. The work also demonstrates practical utility in visual place recognition and stereo depth estimation, highlighting VGIDM as a versatile module for object-level data association in autonomous driving pipelines.
Abstract
Matching landmark patches from a real-time image captured by an on-vehicle camera with landmark patches in an image database plays an important role in various computer perception tasks for autonomous driving. Current methods focus on local matching for regions of interest and do not take into account spatial neighborhood relationships among the image patches, which typically correspond to objects in the environment. In this paper, we construct a spatial graph with the graph vertices corresponding to patches and edges capturing the spatial neighborhood information. We propose a joint feature and metric learning model with graph-based learning. We provide a theoretical basis for the graph-based loss by showing that the information distance between the distributions conditioned on matched and unmatched pairs is maximized under our framework. We evaluate our model using several street-scene datasets and demonstrate that our approach achieves state-of-the-art matching results.
