Refined Graph Encoder Embedding via Self-Training and Latent Community Recovery
Cencheng Shen, Jonathan Larson, Ha Trinh, Carey E. Priebe
TL;DR
This work tackles improving vertex embeddings beyond standard spectral or GEE methods by refining the graph encoder embedding (GEE) through a linear transformation and iterative latent community recovery. The proposed Refined Graph Encoder Embedding (R-GEE) uses linear discriminant analysis to produce a self-trained embedding and then iteratively uncovers latent communities, outputting concatenated refined embeddings and updated labels. Theoretical results show the GEE embedding is asymptotically normal under the stochastic block model and that the LDA transformation estimates the conditional distribution P(Y|X), guiding when refinement should be applied, with latent refinements enhancing margin separation when beneficial. Empirically, simulations and real graph experiments demonstrate improved vertex classification and meaningful latent recovery while retaining linear-time scalability relative to graph size, offering a theoretically grounded alternative to more opaque deep learning approaches.
Abstract
This paper introduces a refined graph encoder embedding method, enhancing the original graph encoder embedding through linear transformation, self-training, and hidden community recovery within observed communities. We provide the theoretical rationale for the refinement procedure, demonstrating how and why our proposed method can effectively identify useful hidden communities under stochastic block models. Furthermore, we show how the refinement method leads to improved vertex embedding and better decision boundaries for subsequent vertex classification. The efficacy of our approach is validated through numerical experiments, which exhibit clear advantages in identifying meaningful latent communities and improved vertex classification across a collection of simulated and real-world graph data.
