Table of Contents
Fetching ...

Efficient Parallel Genetic Algorithm for Perturbed Substructure Optimization in Complex Network

Shanqing Yu, Meng Zhou, Jintao Zhou, Minghao Zhao, Yidan Song, Yao Lu, Zeyu Wang, Qi Xuan

TL;DR

This work tackles the computational burden of genetic-algorithm-based Perturbed Substructure Optimization (PSSO) in graphs by introducing GAPA, a PyTorch-based acceleration framework. GAPA reconstructs GA operations as batch-friendly matrix computations, designs a parallel-friendly fitness evaluation, and provides four acceleration modes (S, SM, M, MNM) to exploit hardware parallelism. It supports 4 graph-mining tasks and 10 GA-based PSSO algorithms, validated on multiple datasets with substantial speedups over the Evox baseline (nearly 4x on average) while preserving solution quality. The framework offers an extensible library and practical guidance for adaptive distributed acceleration, enabling scalable GA-based PSSO in real-world networks.

Abstract

Evolutionary computing, particularly genetic algorithm (GA), is a combinatorial optimization method inspired by natural selection and the transmission of genetic information, which is widely used to identify optimal solutions to complex problems through simulated programming and iteration. Due to its strong adaptability, flexibility, and robustness, GA has shown significant performance and potentiality on perturbed substructure optimization (PSSO), an important graph mining problem that achieves its goals by modifying network structures. However, the efficiency and practicality of GA-based PSSO face enormous challenges due to the complexity and diversity of application scenarios. While some research has explored acceleration frameworks in evolutionary computing, their performance on PSSO remains limited due to a lack of scenario generalizability. Based on these, this paper is the first to present the GA-based PSSO Acceleration framework (GAPA), which simplifies the GA development process and supports distributed acceleration. Specifically, it reconstructs the genetic operation and designs a development framework for efficient parallel acceleration. Meanwhile, GAPA includes an extensible library that optimizes and accelerates 10 PSSO algorithms, covering 4 crucial tasks for graph mining. Comprehensive experiments on 18 datasets across 4 tasks and 10 algorithms effectively demonstrate the superiority of GAPA, achieving an average of 4x the acceleration of Evox. The repository is in https://github.com/NetAlsGroup/GAPA.

Efficient Parallel Genetic Algorithm for Perturbed Substructure Optimization in Complex Network

TL;DR

This work tackles the computational burden of genetic-algorithm-based Perturbed Substructure Optimization (PSSO) in graphs by introducing GAPA, a PyTorch-based acceleration framework. GAPA reconstructs GA operations as batch-friendly matrix computations, designs a parallel-friendly fitness evaluation, and provides four acceleration modes (S, SM, M, MNM) to exploit hardware parallelism. It supports 4 graph-mining tasks and 10 GA-based PSSO algorithms, validated on multiple datasets with substantial speedups over the Evox baseline (nearly 4x on average) while preserving solution quality. The framework offers an extensible library and practical guidance for adaptive distributed acceleration, enabling scalable GA-based PSSO in real-world networks.

Abstract

Evolutionary computing, particularly genetic algorithm (GA), is a combinatorial optimization method inspired by natural selection and the transmission of genetic information, which is widely used to identify optimal solutions to complex problems through simulated programming and iteration. Due to its strong adaptability, flexibility, and robustness, GA has shown significant performance and potentiality on perturbed substructure optimization (PSSO), an important graph mining problem that achieves its goals by modifying network structures. However, the efficiency and practicality of GA-based PSSO face enormous challenges due to the complexity and diversity of application scenarios. While some research has explored acceleration frameworks in evolutionary computing, their performance on PSSO remains limited due to a lack of scenario generalizability. Based on these, this paper is the first to present the GA-based PSSO Acceleration framework (GAPA), which simplifies the GA development process and supports distributed acceleration. Specifically, it reconstructs the genetic operation and designs a development framework for efficient parallel acceleration. Meanwhile, GAPA includes an extensible library that optimizes and accelerates 10 PSSO algorithms, covering 4 crucial tasks for graph mining. Comprehensive experiments on 18 datasets across 4 tasks and 10 algorithms effectively demonstrate the superiority of GAPA, achieving an average of 4x the acceleration of Evox. The repository is in https://github.com/NetAlsGroup/GAPA.
Paper Structure (22 sections, 17 equations, 7 figures, 4 tables)

This paper contains 22 sections, 17 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Acceleration framework of GAPA in genetic operation. GAPA is an acceleration framework designed to optimize genetic operation, including population initialization, crossover, mutation, fitness calculation, and elitism. Specifically, this figure sets the elite size equal to the initial population size and takes a network of eight nodes as an example to illustrate the optimization strategy of GAPA. In this context, the gene pool can be composed of network edges or nodes according to the type of perturbation, represented by $\{ g_1, g_2, \cdots, g_8\}$, $individual_i$ represents the genome owned by the perturbation, and $population$ represents the perturbation population. The initialization operation transforms the perturbation population into a perturbation matrix $POP$, performing crossover and mutation through simple matrix operations. Finally, based on the results calculated by the fitness function, elitism is applied to retain the individuals with the best perturbation effects, which are then used in the next iteration.
  • Figure 2: Workflow of SM mode acceleration. Specifically, in this mode, ${individual}^{s}$ is evenly distributed and assigned to $pn$ processes to calculate fitness.
  • Figure 3: Workflow of M mode acceleration. Specifically, in this mode, the population undergoes the operators of Initialize, Mutation, and Cal_Fit through separate processes. $main\_process$ collects data from the $pn$ processes, and redistributes it back after the operators of Elitism, Selection, and Crossover.
  • Figure 4: Workflow of MNM mode acceleration. Specifically, this mode extends the M mode by adding distributed acceleration to the fitness calculation, which follows the same distributed acceleration approach as the SM mode.
  • Figure 5: Illustration of the additional computation time introduced by multiple processes. This extra time includes the data exchange between processes and the creation and destruction of processes during each iteration. Specifically, ① represents the S mode, ①③ represents the SM mode, ①② represents the M mode, and ①②③④ represents the MNM mode.
  • ...and 2 more figures