Table of Contents
Fetching ...

Learnable Graph Matching: A Practical Paradigm for Data Association

Jiawei He, Zehao Huang, Naiyan Wang, Zhaoxiang Zhang

TL;DR

This work reframes data association as a general graph matching problem by modeling intra-view context with graphs and cross-view relationships via a learnable, differentiable graph matching layer. The core methodological advances include relaxing Koopmans–Beckmann's QAP to a convex QP with a differentiable KKT-based layer, a cross-graph GCN for feature refinement, and a GST-based solver to speed up quadratic assignment. The approach yields state-of-the-art or competitive results across multi-object tracking (MOT), image matching, and point-cloud registration, with notable gains in association metrics and substantial speedups in inference. The practical impact is a unified, end-to-end trainable framework that jointly optimizes appearance, geometry, and combinatorial assignment for robust data association in complex scenes.

Abstract

Data association is at the core of many computer vision tasks, e.g., multiple object tracking, image matching, and point cloud registration. however, current data association solutions have some defects: they mostly ignore the intra-view context information; besides, they either train deep association models in an end-to-end way and hardly utilize the advantage of optimization-based assignment methods, or only use an off-the-shelf neural network to extract features. In this paper, we propose a general learnable graph matching method to address these issues. Especially, we model the intra-view relationships as an undirected graph. Then data association turns into a general graph matching problem between graphs. Furthermore, to make optimization end-to-end differentiable, we relax the original graph matching problem into continuous quadratic programming and then incorporate training into a deep graph neural network with KKT conditions and implicit function theorem. In MOT task, our method achieves state-of-the-art performance on several MOT datasets. For image matching, our method outperforms state-of-the-art methods on a popular indoor dataset, ScanNet. For point cloud registration, we also achieve competitive results. Code will be available at https://github.com/jiaweihe1996/GMTracker.

Learnable Graph Matching: A Practical Paradigm for Data Association

TL;DR

This work reframes data association as a general graph matching problem by modeling intra-view context with graphs and cross-view relationships via a learnable, differentiable graph matching layer. The core methodological advances include relaxing Koopmans–Beckmann's QAP to a convex QP with a differentiable KKT-based layer, a cross-graph GCN for feature refinement, and a GST-based solver to speed up quadratic assignment. The approach yields state-of-the-art or competitive results across multi-object tracking (MOT), image matching, and point-cloud registration, with notable gains in association metrics and substantial speedups in inference. The practical impact is a unified, end-to-end trainable framework that jointly optimizes appearance, geometry, and combinatorial assignment for robust data association in complex scenes.

Abstract

Data association is at the core of many computer vision tasks, e.g., multiple object tracking, image matching, and point cloud registration. however, current data association solutions have some defects: they mostly ignore the intra-view context information; besides, they either train deep association models in an end-to-end way and hardly utilize the advantage of optimization-based assignment methods, or only use an off-the-shelf neural network to extract features. In this paper, we propose a general learnable graph matching method to address these issues. Especially, we model the intra-view relationships as an undirected graph. Then data association turns into a general graph matching problem between graphs. Furthermore, to make optimization end-to-end differentiable, we relax the original graph matching problem into continuous quadratic programming and then incorporate training into a deep graph neural network with KKT conditions and implicit function theorem. In MOT task, our method achieves state-of-the-art performance on several MOT datasets. For image matching, our method outperforms state-of-the-art methods on a popular indoor dataset, ScanNet. For point cloud registration, we also achieve competitive results. Code will be available at https://github.com/jiaweihe1996/GMTracker.
Paper Structure (32 sections, 2 theorems, 26 equations, 7 figures, 12 tables, 2 algorithms)

This paper contains 32 sections, 2 theorems, 26 equations, 7 figures, 12 tables, 2 algorithms.

Key Result

Proposition 5.1

The quadratic programming Eq. finalQP can be solved in $O(n_d^3n_t^3)$ arithmetic operations.

Figures (7)

  • Figure 1: An illustration of intra-graph relationship used in our graph matching formulation. We utilize the second-order edge to model the pairwise relationship, which is more robust in challenging scenes, such as heavy occluded in MOT task. For example, in view 2, the entity with ID 1 can not be associated with the entity correctly in view 1. However, with graph matching, the pairwise relationship helps data association.
  • Figure 2: An example of the derivation from edge affinity matrix $\mathbf{M_e}$ to quadratic affinity matrix $\mathbf{M}$.
  • Figure 3: Overview of our GMTracker method. We first extract features from detections and construct the detection graph using these features. The tracklet graph construction step is similar to the detection graph, but we average the features in a tracklet. Then the cross-graph GCN is adopted to enhance the features. The weight $w_{i,j}$ is from the feature similarity and geometric information. The core of our method is the differentiable graph matching layer built as a QP layer from the formulation in Eq. \ref{['finalQP']}. The $\mathbf{M}_e$ and $\mathbf{B}$ in the graph matching layer denote the edge affinity matrix from Eq. \ref{['eq:e2e']} and the vertex affinity matrix from Eq. \ref{['eq:n2n']} respectively.
  • Figure 4: An illustration of edge matching. Here, for matched pair $(c_1^D,c_1^T)$ in $\bm{\pi}_c$, we find best matching between edge $e_{1,i'}$ and $e_{1,j'}$, drawn in the same color.
  • Figure 5: Pipeline of our image matching network, GMatcher. The backbone is an FPN-like module. The edge and vertex features are from the stride-8 and stride-2 feature maps respectively. Edge and vertex AGNN are operated independently. The learnable graph matching layer replaces the Sinkhorn layer in SuperGlue.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Proposition 5.1: Original complexity
  • Proposition 5.2