Table of Contents
Fetching ...

Efficient Algorithms for Attributed Graph Alignment with Vanishing Edge Correlation

Ziao Wang, Weina Wang, Lele Wang

TL;DR

This work tackles exact graph alignment under vanishing edge correlation by introducing attribute information and a localized, attribute-enhanced subgraph-counting approach. It shows that with a small amount of attribute data, polynomial-time algorithms can achieve exact recovery even when ρ_u = n^{-Θ(1)}, by counting local trees that connect users to attribute anchors and forming robust similarity features. The authors derive polynomial-time feasible regions for both almost exact and exact recovery, and propose two refinement regimes (AttrSparse and AttrRich) to upgrade almost exact outputs to exact alignment. The results extend the computationally feasible landscape beyond constant-correlation regimes and demonstrate practical benefits of attributes for seeded-like graph alignment tasks, with implications for real-world social networks and related applications.

Abstract

Graph alignment refers to the task of finding the vertex correspondence between two correlated graphs of $n$ vertices. Extensive study has been done on polynomial-time algorithms for the graph alignment problem under the Erdős-Rényi graph pair model, where the two graphs are Erdős-Rényi graphs with edge probability $q_\mathrm{u}$, correlated under certain vertex correspondence. To achieve exact recovery of the correspondence, all existing algorithms at least require the edge correlation coefficient $ρ_\mathrm{u}$ between the two graphs to be \emph{non-vanishing} as $n\rightarrow\infty$. Moreover, it is conjectured that no polynomial-time algorithm can achieve exact recovery under vanishing edge correlation $ρ_\mathrm{u}<1/\mathrm{polylog}(n)$. In this paper, we show that with a vanishing amount of additional \emph{attribute information}, exact recovery is polynomial-time feasible under \emph{vanishing} edge correlation $ρ_\mathrm{u} \ge n^{-Θ(1)}$. We identify a \emph{local} tree structure, which incorporates one layer of user information and one layer of attribute information, and apply the subgraph counting technique to such structures. A polynomial-time algorithm is proposed that recovers the vertex correspondence for most of the vertices, and then refines the output to achieve exact recovery. The consideration of attribute information is motivated by real-world applications like LinkedIn and Twitter, where user attributes like birthplace and education background can aid alignment.

Efficient Algorithms for Attributed Graph Alignment with Vanishing Edge Correlation

TL;DR

This work tackles exact graph alignment under vanishing edge correlation by introducing attribute information and a localized, attribute-enhanced subgraph-counting approach. It shows that with a small amount of attribute data, polynomial-time algorithms can achieve exact recovery even when ρ_u = n^{-Θ(1)}, by counting local trees that connect users to attribute anchors and forming robust similarity features. The authors derive polynomial-time feasible regions for both almost exact and exact recovery, and propose two refinement regimes (AttrSparse and AttrRich) to upgrade almost exact outputs to exact alignment. The results extend the computationally feasible landscape beyond constant-correlation regimes and demonstrate practical benefits of attributes for seeded-like graph alignment tasks, with implications for real-world social networks and related applications.

Abstract

Graph alignment refers to the task of finding the vertex correspondence between two correlated graphs of vertices. Extensive study has been done on polynomial-time algorithms for the graph alignment problem under the Erdős-Rényi graph pair model, where the two graphs are Erdős-Rényi graphs with edge probability , correlated under certain vertex correspondence. To achieve exact recovery of the correspondence, all existing algorithms at least require the edge correlation coefficient between the two graphs to be \emph{non-vanishing} as . Moreover, it is conjectured that no polynomial-time algorithm can achieve exact recovery under vanishing edge correlation . In this paper, we show that with a vanishing amount of additional \emph{attribute information}, exact recovery is polynomial-time feasible under \emph{vanishing} edge correlation . We identify a \emph{local} tree structure, which incorporates one layer of user information and one layer of attribute information, and apply the subgraph counting technique to such structures. A polynomial-time algorithm is proposed that recovers the vertex correspondence for most of the vertices, and then refines the output to achieve exact recovery. The consideration of attribute information is motivated by real-world applications like LinkedIn and Twitter, where user attributes like birthplace and education background can aid alignment.
Paper Structure (18 sections, 13 theorems, 191 equations, 8 figures)

This paper contains 18 sections, 13 theorems, 191 equations, 8 figures.

Key Result

Theorem 1

Consider an attributed Erdős--Rényi graph pair model ${\mathcal{G}}(n,{q_\mathrm{u}},{\rho_\mathrm{u}};m,{q_\mathrm{a}},{\rho_\mathrm{a}})$. Suppose for some positive integer $k\ge 3$. Then with high probability, Algorithm alg:subgraph-count with parameters $k$ and for some constant $0<c<1$ outputs an index set $I$ and a vertex correspondence estimate $\hat{\Pi}:I\rightarrow [n]$ satisfying that

Figures (8)

  • Figure 1: Comparison between polynomial-time feasible regions with (a) no attribute information and (b) vanishing attribute information. Figure \ref{['fig:no_attri']} is when $m{q_\mathrm{a}}{\rho_\mathrm{a}}=0$. The green region represents the best-known polynomial-time feasible region. The portion above the red dotted line is attainable using the algorithm by mao2023, while the section to the right of the black dotted line is achievable with the algorithm introduced by ding2023polynomialtime. It is further conjectured by sophie2023matching and mao2022testing that no polynomial-time algorithm achieves exact recovery in the red region. Figure \ref{['fig:with_attri']} is an example of vanishing attribute information $m=\sqrt{n}$, ${q_\mathrm{a}}=\frac{n^{-7/16}}{\sqrt{\log n}}$ and ${\rho_\mathrm{a}}=n^{-1/16}$. The green region is feasible by the proposed algorithm in this work. Note that for clarity the ${\rho_\mathrm{u}}$ axis in the plots are not scaled linearly.
  • Figure 2: Comparison of subgraphs counted in mao2023 and in this work. Figure \ref{['fig:attri_subgraph']} provides an example of the attributed subgraph we propose to count. In this example, we have $k=3$ branches, and each red node represents an attribute. Figure \ref{['fig:chandelier']} provides an example of a chandelier. The structures enclosed by the ovals are the bulbs.
  • Figure 3: An example of computing $\omega_1(S)$. In graph $S$, edge $(4,a_3)$ appears in $G_1$, while $(2,4)$ and $(2,a_1)$ do not. Therefore, these three edges have weights $1-{q_\mathrm{a}}, -{q_\mathrm{u}}$ and $-{q_\mathrm{a}}$ respectively, and the weight of $S$ in $G_1$ is $\omega_1(S)=(1-{q_\mathrm{a}}){q_\mathrm{u}}{q_\mathrm{a}}$.
  • Figure 4: Structure of union graph of $(S_1,S_2,T_1,T_2)\in \Lambda_{ii}^{(0,2)}$. In this example, the number of attribute in each subgraph is $k=4$. The black solid lines represent user-user edges shared by $S_1,S_2$, the blue solid lines represent user-user edges shared by $T_1,T_2$, the red solid lines represent user-user edges shared by $S_1,S_2,T_1,T_2$, the black dotted lines represent user-attribute edges shared by $S_1,S_2$ and the blue dotted lines represent user-attribute edges shared by $T_1,T_2$.
  • Figure 5: Illustration of the four types of shared attributes in the union graph. In this figure, the red nodes represent the attributes and black nodes represent users. The labels beside the edges represent their belonging to graphs $S_1,S_2,T_1,T_2$.
  • ...and 3 more figures

Theorems & Definitions (28)

  • Theorem 1: Almost exact recovery
  • Theorem 2: Exact recovery
  • Remark 1: Interpretation of conditions for exact recovery
  • Remark 2: Complexity of Algorithm \ref{['alg:subgraph-count']}
  • Remark 3: Choice of parameter $k$
  • Remark 4: Intuition behind the proposed statistics
  • Remark 5: Complexity of Algorithm \ref{['alg:refine-sparse']}
  • Remark 6: Time complexity of Algorithm \ref{['alg:refine-rich']}
  • Proposition 1: Expectation of similarity score
  • Proposition 2
  • ...and 18 more