Table of Contents
Fetching ...

Network Alignment

Rui Tang, Ziyun Yong, Shuyu Jiang, Xingshu Chen, Yaofang Liu, Yi-Cheng Zhang, Gui-Quan Sun, Wei Wang

TL;DR

The survey defines network alignment as identifying cross-network correspondences for shared entities, formalizing the problem with seeds and evaluation frameworks. It organizes methods into structure-consistency and machine-learning-based families, detailing local/global strategies, network embedding (Euclidean and hyperbolic spaces), GNNs, and feature-extraction approaches. It then explores alignment under attributes, heterogeneity, directionality, dynamics, and seeds-free scenarios, illustrating domain-specific applications in social networks, bioinformatics, computational linguistics, and privacy. The discussion points to future directions including reinforcement learning, adversarial learning, meta-learning, diffusion models, and the role of higher-order interactions in advancing robust, scalable alignment across complex systems.

Abstract

Complex networks are frequently employed to model physical or virtual complex systems. When certain entities exist across multiple systems simultaneously, unveiling their corresponding relationships across the networks becomes crucial. This problem, known as network alignment, holds significant importance. It enhances our understanding of complex system structures and behaviours, facilitates the validation and extension of theoretical physics research about studying complex systems, and fosters diverse practical applications across various fields. However, due to variations in the structure, characteristics, and properties of complex networks across different fields, the study of network alignment is often isolated within each domain, with even the terminologies and concepts lacking uniformity. This review comprehensively summarizes the latest advancements in network alignment research, focusing on analyzing network alignment characteristics and progress in various domains such as social network analysis, bioinformatics, computational linguistics and privacy protection. It provides a detailed analysis of various methods' implementation principles, processes, and performance differences, including structure consistency-based methods, network embedding-based methods, and graph neural network-based (GNN-based) methods. Additionally, the methods for network alignment under different conditions, such as in attributed networks, heterogeneous networks, directed networks, and dynamic networks, are presented. Furthermore, the challenges and the open issues for future studies are also discussed.

Network Alignment

TL;DR

The survey defines network alignment as identifying cross-network correspondences for shared entities, formalizing the problem with seeds and evaluation frameworks. It organizes methods into structure-consistency and machine-learning-based families, detailing local/global strategies, network embedding (Euclidean and hyperbolic spaces), GNNs, and feature-extraction approaches. It then explores alignment under attributes, heterogeneity, directionality, dynamics, and seeds-free scenarios, illustrating domain-specific applications in social networks, bioinformatics, computational linguistics, and privacy. The discussion points to future directions including reinforcement learning, adversarial learning, meta-learning, diffusion models, and the role of higher-order interactions in advancing robust, scalable alignment across complex systems.

Abstract

Complex networks are frequently employed to model physical or virtual complex systems. When certain entities exist across multiple systems simultaneously, unveiling their corresponding relationships across the networks becomes crucial. This problem, known as network alignment, holds significant importance. It enhances our understanding of complex system structures and behaviours, facilitates the validation and extension of theoretical physics research about studying complex systems, and fosters diverse practical applications across various fields. However, due to variations in the structure, characteristics, and properties of complex networks across different fields, the study of network alignment is often isolated within each domain, with even the terminologies and concepts lacking uniformity. This review comprehensively summarizes the latest advancements in network alignment research, focusing on analyzing network alignment characteristics and progress in various domains such as social network analysis, bioinformatics, computational linguistics and privacy protection. It provides a detailed analysis of various methods' implementation principles, processes, and performance differences, including structure consistency-based methods, network embedding-based methods, and graph neural network-based (GNN-based) methods. Additionally, the methods for network alignment under different conditions, such as in attributed networks, heterogeneous networks, directed networks, and dynamic networks, are presented. Furthermore, the challenges and the open issues for future studies are also discussed.

Paper Structure

This paper contains 37 sections, 76 equations, 20 figures.

Figures (20)

  • Figure 1: Illustration of network alignment. (a) Network alignment in social network analysis. Suppose there are two OSN platforms. Several tasks in this field, such as cross-platform account linkage, can be framed as network alignment problems. (b) Network alignment in bioinformatics. Suppose there are two species, and the proteins within their cells interact with one another. These protein interactions form PPI networks for the different species. Several tasks in this field, such as ortholog identification, can be transformed into network alignment problems. (c) Network alignment in computational linguistics. Suppose there are two knowledge graphs in different languages. Several tasks in this field, such as cross-language node linkage, can be translated into network alignment problems. (d) Network alignment in privacy protection. Suppose an organization shares anonymized network data of its employees for data-sharing purposes. Meanwhile, an attacker collects data on the organization's personnel through web scraping. The problem of de-anonymizing the shared data can be transformed into a network alignment problem. More problems related to network alignment across various fields can be found in Sec. \ref{['Sec: Network alignment in different fields']}. Source: Reproduced from Ref. tang2020interlayerburke2023towardszhang2019multishao2019fast.
  • Figure 2: The statistics of academic papers published over the past decade related to network alignment in different fields: (a) social network analysis, (b) bioinformatics, (c) computational linguistics, and (d) privacy protection. In each subfigure, the left part displays the total number of publications, authors, and citations. The top-right part shows the number of papers published by researchers from various countries, while the bottom-right part highlights the number of papers published in different representative journals or conferences.
  • Figure 3: Example of the local structure consistency-based methods. There are two networks $G^{l_1}$ and $G^{l_2}$. The nodes connected by dashed lines represent observed corresponding nodes, while the others are unmatched. $u^{l_1}_1$ and $u^{l_2}_1$ share three corresponding node pairs, $u^{l_1}_1$ and $u^{l_2}_2$ share one pair, and $u^{l_1}_2$ and $u^{l_2}_1$ share two pairs. For local structure consistency-based methods, the structural similarity between $u^{l_1}_1$ and $u^{l_2}_1$ is higher due to the greater number of shared corresponding node pairs, making them more likely to be considered as corresponding nodes.
  • Figure 4: Illustration of calculating similarity score in IsoRank. There are two small networks. For each possible pair of unmatched nodes across two networks, its similarity score is constrained to depend on the scores from the neighbourhoods recursively. Source: Reproduced from Ref. singh2007pairwise
  • Figure 5: Example of the probabilistic and deterministic networks. The top subfigure represents a probabilistic network, while the four subfigures below are deterministic networks derived from it based on the probabilities of the edge's existence. The number above each deterministic network indicates the probability of obtaining that deterministic network from the probabilistic network. Source: Reproduced from Ref. todor2012probabilistic
  • ...and 15 more figures