On Finding All Connected Maximum-Sized Common Subgraphs in Multiple Labeled Graphs
Johannes B. S. Petersen, Akbar Davoodi, Thomas Gärtner, Marc Hellmuth, Daniel Merkle
TL;DR
This work tackles the problem of finding all maximum-vertex and maximum-edge common subgraphs across multiple labeled graphs, including connected variants, which is particularly relevant for bioinformatics and cheminformatics. It introduces an exact framework built on modular product graphs and a Bron-Kerbosch-based enumeration to list all maximal cliques corresponding to MVCS/MECS, augmented with pruning and a graph-ordering heuristic derived from graph-kernel and minmax similarities. The method extends to vertex- and edge-labeled graphs and to type-A connected cliques, with careful handling of Δ-Y ambiguities in line graphs via an inverse mapping; it also provides pruning strategies and parallelizable components to improve practicality. Empirical evaluation on large molecular datasets demonstrates scalability and speedups from the proposed ordering and pruning techniques, and an open-source implementation is made available for reproducibility and broader adoption.
Abstract
We present an exact algorithm for computing all common subgraphs with the maximum number of vertices across multiple graphs. Our approach is further extended to handle the connected Maximum Common Subgraph (MCS), identifying the largest common subgraph in terms of either vertices or edges across multiple graphs, where edges or vertices may additionally be labeled to account for possible atom types or bond types, a classical labeling used in molecular graphs. Our approach leverages modular product graphs and a modified Bron-Kerbosch algorithm to enumerate maximal cliques, ensuring all intermediate solutions are retained. A pruning heuristic efficiently reduces the modular product size, improving computational feasibility. Additionally, we introduce a graph ordering strategy based on graph-kernel similarity measures to optimize the search process. Our method is particularly relevant for bioinformatics and cheminformatics, where identifying conserved structural motifs in molecular graphs is crucial. Empirical results on molecular datasets demonstrate that our approach is scalable and fast.
