Efficient Hypergraph Pattern Matching via Match-and-Filter and Intersection Constraint
Siwoo Song, Wonseok Shin, Kunsoo Park, Giuseppe F. Italiano, Zhengyi Yang, Wenjie Zhang
TL;DR
<3-5 sentence high-level summary> The paper addresses the challenging problem of hypergraph pattern matching, which is NP-hard, by introducing MaCH, a framework that combines a novel intersection constraint, a candidate hyperedge space (CHS), and a Match-and-Filter approach to prune the search space during backtracking. The key ideas are the formalization of three constraints (Hyperedge Signature, Connectivity, and Intersection), and the demonstration that the Intersection Constraint provides a necessary-and-sufficient condition for valid embeddings via efficient cell-based verification. Empirically, MaCH significantly outperforms state-of-the-art methods (HGMatch, OHMiner, GuP) on real and large-scale hypergraphs, achieving up to orders-of-magnitude speedups and lower memory usage. The work suggests broad applicability of the intersection constraint and the CHS/Match-and-Filter paradigm beyond hypergraph pattern matching.
Abstract
A hypergraph is a generalization of a graph, in which a hyperedge can connect multiple vertices, modeling complex relationships involving multiple vertices simultaneously. Hypergraph pattern matching, which is to find all isomorphic embeddings of a query hypergraph in a data hypergraph, is one of the fundamental problems. In this paper, we present a novel algorithm for hypergraph pattern matching by introducing (1) the intersection constraint, a necessary and sufficient condition for valid embeddings, which significantly speeds up the verification process, (2) the candidate hyperedge space, a data structure that stores potential mappings between hyperedges in the query hypergraph and the data hypergraph, and (3) the Match-and-Filter framework, which interleaves matching and filtering operations to maintain only compatible candidates in the candidate hyperedge space during backtracking. Experimental results on real-world datasets demonstrate that our algorithm significantly outperforms the state-of-the-art algorithms, by up to orders of magnitude in terms of query processing time.
