Fast Maximum Common Subgraph Search: A Redundancy-Reduced Backtracking Approach
Kaiqiang Yu, Kaixin Wang, Cheng Long, Laks Lakshmanan, Reynold Cheng
TL;DR
This work tackles the NP-hard maximum common subgraph problem by introducing RRSplit, a backtracking algorithm that combines redundancy-reducing reductions with a tighter upper bound to prune search space. The core innovations are vertex-equivalence-based reductions, maximality-based reductions, and a new vertex-equivalence-based upper bound, all supported by an exclusion structure to track past explorations. The authors prove a worst-case time complexity of $O^*((|V_G|+1)^{|V_Q|})$, matching the best-known theoretical bound, while demonstrating through extensive experiments that RRSplit dramatically outperforms state-of-the-art McSplit-based methods in practice. The results indicate strong potential for scalable exact MCS computation and suggest directions for extending the approach to labeled graphs and broader graph domains.
Abstract
Given two input graphs, finding the largest subgraph that occurs in both, i.e., finding the maximum common subgraph, is a fundamental operator for evaluating the similarity between two graphs in graph data analysis. Existing works for solving the problem are of either theoretical or practical interest, but not both. Specifically, the algorithms with a theoretical guarantee on the running time are known to be not practically efficient; algorithms following the recently proposed backtracking framework called McSplit, run fast in practice but do not have any theoretical guarantees. In this paper, we propose a new backtracking algorithm called RRSplit, which at once achieves better practical efficiency and provides a non-trivial theoretical guarantee on the worst-case running time. To achieve the former, we develop a series of reductions and upper bounds for reducing redundant computations, i.e., the time for exploring some unpromising branches of exploration that hold no maximum common subgraph. To achieve the latter, we formally prove that RRSplit incurs a worst-case time complexity which matches the best-known complexity for the problem. Finally, we conduct extensive experiments on four benchmark graph collections, and the results demonstrate that our algorithm outperforms the practical state-of-the-art by several orders of magnitude.
