Table of Contents
Fetching ...

FLEXIS: FLEXible Frequent Subgraph Mining using Maximal Independent Sets

Akshit Sharma, Sam Reinher, Dinesh Mehta, Bo Wu

TL;DR

This work addresses the computational bottlenecks in frequent subgraph mining (FSM) on single graphs by introducing FLEXIS, a framework that (i) generates k-vertex candidates by merging two frequent (k-1)-vertex patterns, thus pruning the search space, and (ii) uses a Maximal Independent Set–based metric, mal, with a tunable lambda slider to control overlap with MIS. The method integrates a vertex-centric generation strategy with a modified VF3Light matcher that enforces vertex-disjoint embeddings, enabling faster pruning and early termination. The authors prove a theoretical bound relating maximal and maximum independent sets and demonstrate that mal provides a flexible accuracy-speed trade-off; empirical results show substantial speedups (up to ~10.6x over GraMi and ~3x over T-FSM) and favorable memory behavior across multiple real-world datasets. Overall, FLEXIS offers a practical, tunable, and more scalable FSM solution for diverse applications requiring precise control over pattern overlap and reduction of candidate space.

Abstract

Frequent Subgraph Mining (FSM) is the process of identifying common subgraph patterns that surpass a predefined frequency threshold. While FSM is widely applicable in fields like bioinformatics, chemical analysis, and social network anomaly detection, its execution remains time-consuming and complex. This complexity stems from the need to recognize high-frequency subgraphs and ascertain if they exceed the set threshold. Current approaches to identifying these patterns often rely on edge or vertex extension methods. However, these strategies can introduce redundancies and cause increased latency. To address these challenges, this paper introduces a novel approach for identifying potential k-vertex patterns by combining two frequently observed (k - 1)-vertex patterns. This method optimizes the breadth-]first search, which allows for quicker search termination based on vertices count and support value. Another challenge in FSM is the validation of the presumed pattern against a specific threshold. Existing metrics, such as Maximum Independent Set (MIS) and Minimum Node Image (MNI), either demand significant computational time or risk overestimating pattern counts. Our innovative approach aligns with the MIS and identifies independent subgraphs. Through the "Maximal Independent Set" metric, this paper offers an efficient solution that minimizes latency and provides users with control over pattern overlap. Through extensive experimentation, our proposed method achieves an average of 10.58x speedup when compared to GraMi and an average 3x speedup when compared to T-FSM

FLEXIS: FLEXible Frequent Subgraph Mining using Maximal Independent Sets

TL;DR

This work addresses the computational bottlenecks in frequent subgraph mining (FSM) on single graphs by introducing FLEXIS, a framework that (i) generates k-vertex candidates by merging two frequent (k-1)-vertex patterns, thus pruning the search space, and (ii) uses a Maximal Independent Set–based metric, mal, with a tunable lambda slider to control overlap with MIS. The method integrates a vertex-centric generation strategy with a modified VF3Light matcher that enforces vertex-disjoint embeddings, enabling faster pruning and early termination. The authors prove a theoretical bound relating maximal and maximum independent sets and demonstrate that mal provides a flexible accuracy-speed trade-off; empirical results show substantial speedups (up to ~10.6x over GraMi and ~3x over T-FSM) and favorable memory behavior across multiple real-world datasets. Overall, FLEXIS offers a practical, tunable, and more scalable FSM solution for diverse applications requiring precise control over pattern overlap and reduction of candidate space.

Abstract

Frequent Subgraph Mining (FSM) is the process of identifying common subgraph patterns that surpass a predefined frequency threshold. While FSM is widely applicable in fields like bioinformatics, chemical analysis, and social network anomaly detection, its execution remains time-consuming and complex. This complexity stems from the need to recognize high-frequency subgraphs and ascertain if they exceed the set threshold. Current approaches to identifying these patterns often rely on edge or vertex extension methods. However, these strategies can introduce redundancies and cause increased latency. To address these challenges, this paper introduces a novel approach for identifying potential k-vertex patterns by combining two frequently observed (k - 1)-vertex patterns. This method optimizes the breadth-]first search, which allows for quicker search termination based on vertices count and support value. Another challenge in FSM is the validation of the presumed pattern against a specific threshold. Existing metrics, such as Maximum Independent Set (MIS) and Minimum Node Image (MNI), either demand significant computational time or risk overestimating pattern counts. Our innovative approach aligns with the MIS and identifies independent subgraphs. Through the "Maximal Independent Set" metric, this paper offers an efficient solution that minimizes latency and provides users with control over pattern overlap. Through extensive experimentation, our proposed method achieves an average of 10.58x speedup when compared to GraMi and an average 3x speedup when compared to T-FSM
Paper Structure (42 sections, 6 theorems, 1 equation, 13 figures, 3 tables, 5 algorithms)

This paper contains 42 sections, 6 theorems, 1 equation, 13 figures, 3 tables, 5 algorithms.

Key Result

theorem 1

Given a pattern graph with $n$ vertices. Let $m$ denote the number of mappings in a maximal (mIS) independent set and $M$ the number of mappings in a maximum independent set. Then $m \le M \le mn$.

Figures (13)

  • Figure 1: The double arrow represents directed edges in both directions. Labels are denoted by vertex colors.
  • Figure 2: CoreGraphs & extended CoreGraph for pattern $P_1$
  • Figure 3: All possible Mappings for pattern $P_1$
  • Figure 4: Maximal independent set visualization
  • Figure 5: Merge process
  • ...and 8 more figures

Theorems & Definitions (11)

  • theorem 1
  • proof
  • lemma 1
  • lemma 2
  • proof
  • lemma 3
  • proof
  • lemma 4
  • proof
  • theorem 2
  • ...and 1 more