Table of Contents
Fetching ...

Accelerating Graph Similarity Search through Integer Linear Programming

Andrea D'Ascenzo, Julian Meffert, Petra Mutzel, Fabrizio Rossi

TL;DR

The paper tackles graph similarity search under the Graph Edit Distance (GED) threshold, addressing the NP-hard GED computation by introducing an ILP-based approach built on the FORI formulation and a hierarchical set of lower bounds to efficiently prune candidates in a filter-verification framework. It proves that the LP relaxation of FORI yields a lower bound that dominates the existing branch-match lower bound and demonstrates, via a star-cycle construction, that the gap between LP-based and traditional bounds can grow arbitrarily with graph size. The authors then develop FORI-sim, a threshold-aware verification algorithm that uses a sequence of lower bounds (LS, BM, fori-lp) and a threshold constraint to terminate early, significantly accelerating GED verification. Extensive experiments on real datasets (aids, muta, prot) show that FORI-sim outperforms the state-of-the-art a$^*$-bmao across most thresholds, particularly for larger tolerances and non-uniform edit costs, revealing strong scalability and robustness for large-scale graph databases.

Abstract

The Graph Edit Distance (GED) is an important metric for measuring the similarity between two (labeled) graphs. It is defined as the minimum cost required to convert one graph into another through a series of (elementary) edit operations. Its effectiveness in assessing the similarity of large graphs is limited by the complexity of its exact calculation, which is NP-hard theoretically and computationally challenging in practice. The latter can be mitigated by switching to the Graph Similarity Search under GED constraints, which determines whether the edit distance between two graphs is below a given threshold. A popular framework for solving Graph Similarity Search under GED constraints in a graph database for a query graph is the filter-and-verification framework. Filtering discards unpromising graphs, while the verification step certifies the similarity between the filtered graphs and the query graph. To improve the filtering step, we define a lower bound based on an integer linear programming formulation. We prove that this lower bound dominates the effective branch match-based lower bound and can also be computed efficiently. Consequently, we propose a graph similarity search algorithm that uses a hierarchy of lower bound algorithms and solves a novel integer programming formulation that exploits the threshold parameter. An extensive computational experience on a well-assessed test bed shows that our approach significantly outperforms the state-of-the-art algorithm on most of the examined thresholds.

Accelerating Graph Similarity Search through Integer Linear Programming

TL;DR

The paper tackles graph similarity search under the Graph Edit Distance (GED) threshold, addressing the NP-hard GED computation by introducing an ILP-based approach built on the FORI formulation and a hierarchical set of lower bounds to efficiently prune candidates in a filter-verification framework. It proves that the LP relaxation of FORI yields a lower bound that dominates the existing branch-match lower bound and demonstrates, via a star-cycle construction, that the gap between LP-based and traditional bounds can grow arbitrarily with graph size. The authors then develop FORI-sim, a threshold-aware verification algorithm that uses a sequence of lower bounds (LS, BM, fori-lp) and a threshold constraint to terminate early, significantly accelerating GED verification. Extensive experiments on real datasets (aids, muta, prot) show that FORI-sim outperforms the state-of-the-art a-bmao across most thresholds, particularly for larger tolerances and non-uniform edit costs, revealing strong scalability and robustness for large-scale graph databases.

Abstract

The Graph Edit Distance (GED) is an important metric for measuring the similarity between two (labeled) graphs. It is defined as the minimum cost required to convert one graph into another through a series of (elementary) edit operations. Its effectiveness in assessing the similarity of large graphs is limited by the complexity of its exact calculation, which is NP-hard theoretically and computationally challenging in practice. The latter can be mitigated by switching to the Graph Similarity Search under GED constraints, which determines whether the edit distance between two graphs is below a given threshold. A popular framework for solving Graph Similarity Search under GED constraints in a graph database for a query graph is the filter-and-verification framework. Filtering discards unpromising graphs, while the verification step certifies the similarity between the filtered graphs and the query graph. To improve the filtering step, we define a lower bound based on an integer linear programming formulation. We prove that this lower bound dominates the effective branch match-based lower bound and can also be computed efficiently. Consequently, we propose a graph similarity search algorithm that uses a hierarchy of lower bound algorithms and solves a novel integer programming formulation that exploits the threshold parameter. An extensive computational experience on a well-assessed test bed shows that our approach significantly outperforms the state-of-the-art algorithm on most of the examined thresholds.

Paper Structure

This paper contains 13 sections, 3 theorems, 12 equations, 12 figures, 2 tables, 1 algorithm.

Key Result

Theorem 4.1

Let $G=(V_G,E_G)$ and $H=(V_H,E_H)$ arbitrary labeled graphs together with the unit cost function, then it holds that the lower bound value of is not larger than that of fori-lp.

Figures (12)

  • Figure 1: Graph similarity search.
  • Figure 2: The Graph Edit Distance $GED(G,H)$ for $G$ and $H$ for unit edit costs is 5.
  • Figure 3: Formulation foriDAscenzoMMR25.
  • Figure 4: ILP Formulation (F1) Lerouge2017.
  • Figure 5: The dual of the FORI LP-relaxation.
  • ...and 7 more figures

Theorems & Definitions (8)

  • Definition 3.1: Graph Similarity Search
  • Definition 4.1: Label set-based lower bound blumenthal2017exact
  • Definition 4.2: Branch match-based lower bound zheng2014efficient
  • Theorem 4.1
  • proof
  • Theorem 4.2
  • proof
  • Corollary 4.2.1