Network Layout Algorithm with Covariate Smoothing
Octavious Smiley, Till Hoffmann, Jukka-Pekka Onnela
TL;DR
The paper tackles robust visualization of networks with observation errors by leveraging nodal covariates to inform edge probabilities. It introduces a model-based dyadic probability estimate $\hat{B}$ from covariates $X$ and observed adjacency $A$, and embeds this into a modified Fruchterman-Reingold energy $Q_2$ with a smoothing parameter $\gamma \in [0,1]$, allowing a continuum between observed edges and covariate-based probabilities. A tuning metric $\psi_{\gamma}$ based on standardized cross-terms $m_{\gamma}$ and edge-length changes $e_{\gamma}$ guides selection of $\gamma$, with validation on simulated networks (SBM and continuous covariates) and a real Add Health data application. Results show increased clustering and layout robustness when covariates strongly predict connections, and provide practical guidance on when to apply covariate smoothing versus standard FR, along with reproducibility resources.
Abstract
Network science explores intricate connections among objects, employed in diverse domains like social interactions, fraud detection, and disease spread. Visualization of networks facilitates conceptualizing research questions and forming scientific hypotheses. Networks, as mathematical high-dimensional objects, require dimensionality reduction for (planar) visualization. Visualizing empirical networks present additional challenges. They often contain false positive (spurious) and false negative (missing) edges. Traditional visualization methods don't account for errors in observation, potentially biasing interpretations. Moreover, contemporary network data includes rich nodal attributes. However, traditional methods neglect these attributes when computing node locations. Our visualization approach aims to leverage nodal attribute richness to compensate for network data limitations. We employ a statistical model estimating the probability of edge connections between nodes based on their covariates. We enhance the Fruchterman-Reingold algorithm to incorporate estimated dyad connection probabilities, allowing practitioners to balance reliance on observed versus estimated edges. We explore optimal smoothing levels, offering a natural way to include relevant nodal information in layouts. Results demonstrate the effectiveness of our method in achieving robust network visualization, providing insights for improved analysis.
