Gotta match 'em all: Solution diversification in graph matching matched filters

Zhirui Li; Ben Johnson; Daniel L. Sussman; Carey E. Priebe; Vince Lyzinski

Gotta match 'em all: Solution diversification in graph matching matched filters

Zhirui Li, Ben Johnson, Daniel L. Sussman, Carey E. Priebe, Vince Lyzinski

TL;DR

This work tackles the problem of finding multiple noisily embedded templates in a large background graph by extending the graph-matching matched-filter framework with solution diversification. It introduces a Multiple Correlated Erdős-Rényi model to capture multiple embedded templates and a node-feature similarity term through a matrix $S$, integrated into a padded GMP objective to encourage diverse recoveries. The authors prove that, under mild conditions, down-weighting strong templates enables recovery of weaker templates with high probability, and they implement scalable speedups for the optimization, including reduced complexity for the linear assignment subproblem and random restarts with masking. Empirically, the approach yields multiple recovered templates in synthetic MCER settings and demonstrates practical utility on MRI brain connectomes and a large Transactional Knowledge Base, highlighting both the benefits and trade-offs of padding choices and penalty tuning for diversification.

Abstract

We present a novel approach for finding multiple noisily embedded template graphs in a very large background graph. Our method builds upon the graph-matching-matched-filter technique proposed in Sussman et al., with the discovery of multiple diverse matchings being achieved by iteratively penalizing a suitable node-pair similarity matrix in the matched filter algorithm. In addition, we propose algorithmic speed-ups that greatly enhance the scalability of our matched-filter approach. We present theoretical justification of our methodology in the setting of correlated Erdos-Renyi graphs, showing its ability to sequentially discover multiple templates under mild model conditions. We additionally demonstrate our method's utility via extensive experiments both using simulated models and real-world dataset, include human brain connectomes and a large transactional knowledge base.

Gotta match 'em all: Solution diversification in graph matching matched filters

TL;DR

, integrated into a padded GMP objective to encourage diverse recoveries. The authors prove that, under mild conditions, down-weighting strong templates enables recovery of weaker templates with high probability, and they implement scalable speedups for the optimization, including reduced complexity for the linear assignment subproblem and random restarts with masking. Empirically, the approach yields multiple recovered templates in synthetic MCER settings and demonstrates practical utility on MRI brain connectomes and a large Transactional Knowledge Base, highlighting both the benefits and trade-offs of padding choices and penalty tuning for diversification.

Abstract

Paper Structure (18 sections, 7 theorems, 58 equations, 12 figures, 1 table)

This paper contains 18 sections, 7 theorems, 58 equations, 12 figures, 1 table.

Introduction
Multiple Correlated Erdős-Rényi
Notations
Solution diversification
Theoretical benefits of down-weighting
Experimental results
Two overlapping templates
Three overlapping templates
MRI Brain data
Template discovery in TKBs
Conclusion and discussion
Appendix
Proof of Theorem 2:
More Experiments
Additional two overlapping templates experiments
...and 3 more sections

Key Result

Theorem 1

Let $A$ and $B$ be graphs as described above. Assuming we can only observe the edges of $A$ and $B$ but have no additional knowledge about the vertex-based graph features, then with probability at least $1-n^{-2}$, we have that $\underset{P \in {\Pi_n}}{\operatorname{argmax}} \operatorname{tr}\left(

Figures (12)

Figure 1: We fix $k=10$ and use the seeded GMMF algorithm to match $A$ with $B$ using 5 seeds randomly selected from the overlapping nodes of $B^{(1)}$ and $B^{(2)}$ as described in Section \ref{['sec:N2']}. We plot the recovering results over $\varepsilon$ (here $\varepsilon$ is used to penalize the stronger of the two embedded templates) and $\lambda$, averaged by 20 Monte-Carlo simulations. In the figures, stronger colors represent better recovery of the embedded templates, and t1 (blue) stands for template 1, t2 (red) stands for template 2, with white squares corresponding to the case when none of the two templates was recovered or equal amounts of each template were recovered among the 20 simulations.
Figure 2: We fix $k=10,\lambda=25$ and use the seeded GMMF algorithm with the centered padding to match $A$ with $B$ using 5 seeds randomly selected from the overlapping nodes of $B^{(1)}, B^{(2)}$ and $B^{(3)}$, where $B^{(1)}, B^{(2)}$ and $B^{(3)}$ are induced subgraph of $B$ such that graphs $A$ and $B$ follows multiple correlated ER model as described in Section \ref{['sec:N3']}. We plot the recovering results over $\varepsilon_1$ (penalty applied to the diagonal elements of $S^{(11)}, S^{(22)}$) and $\varepsilon_2$ (penalty applied to the diagonal elements of $S^{(13)}, S^{(22)}$), averaged by 20 Monte-Carlo simulations. In the figure, the different patterns represent which template was recovered (in majority): t1 for template 1, t2 for template 2, and t3 for template 3, with white squares corresponding to the case when none of the three templates was recovered.
Figure 3: We run our proposed algorithm using the seeded GMMF algorithm with 500 restarts and 5 seeds selected from the node pairs $\{(s_j,w_j)\}_{j=1}^6$ as described in \ref{['sec:brain']}, taking the result with highest objective function value (Eq. \ref{['npgmp']}) as the output. For each $\varepsilon$, we compute the GM objective function value (left axis) of the resulting matrix with the template; we also computed the objective function value with respect to the alignment given by the template to the same classified brain region in the left hemisphere in $B$ (Left--to--Left in the plot), as well as the objective function value given by the template to the symmetric region from the right hemisphere in $B$ (Left--to--right in the plot). Also for $\varepsilon>0$, we calculated the number of novel nodes recovered in each matching compared to the subgraph detected with $\varepsilon=0$ (right axis).
Figure 4: We run 32 random restarts of the GMMF algorithm for each template recovery, plotting the empirical CDF of the GED of the recovered templates. Different penalization values are represented with different colors in the plot.
Figure 5: We fix $k=15$ and use the seeded GMMF algorithm to match $A$ with $B$ using 5 seeds randomly selected from the overlapping nodes of $B^{(1)}$ and $B^{(2)}$ as described in Section \ref{['sec:N2']}. We plot the recovering results over $\varepsilon$ (here $\varepsilon$ is used to penalize the stronger of the two embedded templates) and $\lambda$, averaged by 20 Monte-Carlo simulations, where blue means the recovered template is closer to $B^{(1)}$ (the stronger embedded template), red means the recovered template is closer to $B^{(2)}$ (the weaker embedded template), and white means there is a tie in the 20 simulations or the recovered template is not close to either $B^{(1)}$ or $B^{(2)}$.
...and 7 more figures

Theorems & Definitions (16)

Definition 1
Definition 2
Remark 1
Theorem 1
Theorem 2
Remark 2
proof
Proposition 1
proof
Proposition 2
...and 6 more

Gotta match 'em all: Solution diversification in graph matching matched filters

TL;DR

Abstract

Gotta match 'em all: Solution diversification in graph matching matched filters

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (16)