Table of Contents
Fetching ...

Image Patch-Matching with Graph-Based Learning in Street Scenes

Rui She, Qiyu Kang, Sijie Wang, Wee Peng Tay, Yong Liang Guan, Diego Navarro Navarro, Andreas Hartmannsgruber

TL;DR

This paper addresses the challenge of robust landmark patch matching in street scenes by exploiting spatial relationships among patches. It introduces VGIDM, a graph-based framework that constructs neighborhood graphs for patches, learns joint vertex and graph embeddings via a ResNet encoder and a GNN, and uses a learnable discriminator to maximize information distance between matched and unmatched patch pairs. The authors provide theoretical grounding linking the learning objective to information-theoretic distances and demonstrate state-of-the-art results on two new landmark patch datasets derived from KITTI and Oxford RobotCar, with strong cross-dataset generalization. The work also demonstrates practical utility in visual place recognition and stereo depth estimation, highlighting VGIDM as a versatile module for object-level data association in autonomous driving pipelines.

Abstract

Matching landmark patches from a real-time image captured by an on-vehicle camera with landmark patches in an image database plays an important role in various computer perception tasks for autonomous driving. Current methods focus on local matching for regions of interest and do not take into account spatial neighborhood relationships among the image patches, which typically correspond to objects in the environment. In this paper, we construct a spatial graph with the graph vertices corresponding to patches and edges capturing the spatial neighborhood information. We propose a joint feature and metric learning model with graph-based learning. We provide a theoretical basis for the graph-based loss by showing that the information distance between the distributions conditioned on matched and unmatched pairs is maximized under our framework. We evaluate our model using several street-scene datasets and demonstrate that our approach achieves state-of-the-art matching results.

Image Patch-Matching with Graph-Based Learning in Street Scenes

TL;DR

This paper addresses the challenge of robust landmark patch matching in street scenes by exploiting spatial relationships among patches. It introduces VGIDM, a graph-based framework that constructs neighborhood graphs for patches, learns joint vertex and graph embeddings via a ResNet encoder and a GNN, and uses a learnable discriminator to maximize information distance between matched and unmatched patch pairs. The authors provide theoretical grounding linking the learning objective to information-theoretic distances and demonstrate state-of-the-art results on two new landmark patch datasets derived from KITTI and Oxford RobotCar, with strong cross-dataset generalization. The work also demonstrates practical utility in visual place recognition and stereo depth estimation, highlighting VGIDM as a versatile module for object-level data association in autonomous driving pipelines.

Abstract

Matching landmark patches from a real-time image captured by an on-vehicle camera with landmark patches in an image database plays an important role in various computer perception tasks for autonomous driving. Current methods focus on local matching for regions of interest and do not take into account spatial neighborhood relationships among the image patches, which typically correspond to objects in the environment. In this paper, we construct a spatial graph with the graph vertices corresponding to patches and edges capturing the spatial neighborhood information. We propose a joint feature and metric learning model with graph-based learning. We provide a theoretical basis for the graph-based loss by showing that the information distance between the distributions conditioned on matched and unmatched pairs is maximized under our framework. We evaluate our model using several street-scene datasets and demonstrate that our approach achieves state-of-the-art matching results.
Paper Structure (22 sections, 22 equations, 16 figures, 12 tables)

This paper contains 22 sections, 22 equations, 16 figures, 12 tables.

Figures (16)

  • Figure 1: Landmark patch-matching using spatial graphs in street scenes and its potential applications.
  • Figure 2: Landmark patches matching in two full-sized images sampled from the Oxford Radar RobotCar dataset. The matched landmark patches are labeled with the same colored bounding boxes, while the white bounding box indicates that the landmark patch in one image has no matched pair in the other image. Green lines indicate the constructed graph edges in our model.
  • Figure 3: VGIDM: landmark patch-matching with the graph-based learning. The Resnet $f$ shown in the framework is a shared network serving as the feature descriptor function $f$ to extract high-dimensional features from patches. Likewise, the discriminator $d$ is also shared to make a decision for the vertex-to-graph correspondence. The model takes as input a pair of image patches that correspond to street scene landmarks.
  • Figure 4: (a) and (b) are landmark patch samples (displayed with intentionally included background) from the KITTI dataset and Oxford Radar RobotCar dataset respectively.
  • Figure 5: A semantic segmentation image and its corresponding real image, both with bounding box labels, from the KITTI dataset.
  • ...and 11 more figures

Theorems & Definitions (3)

  • proof
  • proof
  • proof