A customizable inexact subgraph matching algorithm for attributed graphs
Tatyana Benko, Rebecca Jones, Lucas Tate
TL;DR
The paper tackles inexact subgraph matching on attributed graphs, where noise and variation in node/edge attributes complicate exact matching. It introduces a highly configurable DFS-based algorithm that searches the target graph, evaluating node mappings with a tunable graph edit-distance cost function that blends node similarity, edge consistency, and look-ahead considerations. Start nodes are selected via attribute similarity and local neighborhood checks, enabling focused exploration, while backtracking and pruning ensure near-optimal mappings are found. Empirical results on family-tree graphs and binary control-flow graphs demonstrate the method’s ability to identify patterns under noise, with strongest performance on small query graphs and reasonable scalability to tens of thousands of target nodes.
Abstract
Graphs provide a natural way to represent data by encoding information about objects and the relationships between them. With the ever-increasing amount of data collected and generated, locating specific patterns of relationships between objects in a graph is often required. Given a larger graph and a smaller graph, one may wish to identify instances of the smaller query graph in the larger target graph. This task is called subgraph identification or matching. Subgraph matching is helpful in areas such as bioinformatics, binary analysis, pattern recognition, and computer vision. In these applications, datasets frequently contain noise and errors, thus exact subgraph matching algorithms do not apply. In this paper we introduce a new customizable algorithm for inexact subgraph matching. Our algorithm utilizes node and edge attributes which are often present in real-world datasets to narrow down the search space. The algorithm is flexible in the type of subgraph matching it can perform and the types of datasets it can process by its use of a modifiable graph edit distance cost function for pairing nodes. We show its effectiveness on family trees graphs and control-flow graphs.
