Table of Contents
Fetching ...

Grounding force-directed network layouts with latent space models

Felix Gaisbauer, Armin Pournaki, Sven Banisch, Eckehard Olbrich

TL;DR

This paper presents a principled way to ground force-directed network layouts in latent space models, making node positions interpretable as maximum-likelihood estimates of latent positions and parameters. By deriving force equations for unweighted, cumulative, and weighted networks from a latent-space probability model, the Leipzig Layout returns layouts whose geometry reflects probabilistic tie formation. Validation includes comparisons to modularity-based interpretations and application to real networks (Facebook friendship, German parliament Twitter, Harper’s letter, and survey data), showing meaningful separations and axis structures while highlighting local minima as a limitation. The approach provides a framework to spatialise data beyond traditional networks and offers a blueprint for extending force-directed layouts via alternative interaction models.

Abstract

Force-directed layout algorithms are ubiquitously-used tools for network visualisation across a multitude of scientific disciplines. However, they lack theoretical grounding which allows to interpret their outcomes rigorously and can guide the choice of specific algorithms for certain data sets. We propose an approach building on latent space models, which assume that the probability of nodes forming a tie depends on their distance in an unobserved latent space. From such latent space models, we derive force equations for a force-directed layout algorithm. Since the forces infer positions which maximise the likelihood of the given network under the latent space model, the force-directed layout becomes interpretable. We implement these forces for unweighted and weighted networks and spatialise different real-world networks. Comparison to existing layout algorithms (not grounded in an interpretable model) reveals that node groups are placed in similar configurations, while said algorithms show a stronger intra-cluster separation of nodes, as well as a tendency to separate clusters more strongly in retweet networks. We also explore the possibility of visualising data traditionally not seen as network data, such as survey data.

Grounding force-directed network layouts with latent space models

TL;DR

This paper presents a principled way to ground force-directed network layouts in latent space models, making node positions interpretable as maximum-likelihood estimates of latent positions and parameters. By deriving force equations for unweighted, cumulative, and weighted networks from a latent-space probability model, the Leipzig Layout returns layouts whose geometry reflects probabilistic tie formation. Validation includes comparisons to modularity-based interpretations and application to real networks (Facebook friendship, German parliament Twitter, Harper’s letter, and survey data), showing meaningful separations and axis structures while highlighting local minima as a limitation. The approach provides a framework to spatialise data beyond traditional networks and offers a blueprint for extending force-directed layouts via alternative interaction models.

Abstract

Force-directed layout algorithms are ubiquitously-used tools for network visualisation across a multitude of scientific disciplines. However, they lack theoretical grounding which allows to interpret their outcomes rigorously and can guide the choice of specific algorithms for certain data sets. We propose an approach building on latent space models, which assume that the probability of nodes forming a tie depends on their distance in an unobserved latent space. From such latent space models, we derive force equations for a force-directed layout algorithm. Since the forces infer positions which maximise the likelihood of the given network under the latent space model, the force-directed layout becomes interpretable. We implement these forces for unweighted and weighted networks and spatialise different real-world networks. Comparison to existing layout algorithms (not grounded in an interpretable model) reveals that node groups are placed in similar configurations, while said algorithms show a stronger intra-cluster separation of nodes, as well as a tendency to separate clusters more strongly in retweet networks. We also explore the possibility of visualising data traditionally not seen as network data, such as survey data.

Paper Structure

This paper contains 21 sections, 42 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: Expected distance of an SBM (two blocks, 100 nodes each) with varying $p_{\text{out}}$ compared to the distance between the center of mass of the clusters in the proposed layout algorithm, averaged over 5 runs (A). Not only is the inferred distance by the force-directed layout algorithm nearly identical to the expected one throughout, but the log-likelihood of the inferred latent space surpasses the ground truth in all cases (B). The difference between inferred log-likelihood and the log-likelihood of the ground truth for a Gaussian distribution of two groups of nodes ($\sigma = 1/12$, $d = 5/6$, averaged over 3 runs) is given in C. In all cases, the log-likelihood of the inferred latent space surpasses the ground truth (i.e. them difference in log-likelihood is positive). Still, similarity between ground truth and inferred distances between nodes is high (increasing with number of nodes), as is visible in the average Pearson correlation between distance matrices (D).
  • Figure 2: Friendship network of students of Haverford University (top) and California Institute of Technology (bottom), colored by year (left) and residence (right). The spatialisation of the former layers students by year (A, chronologically ordered from top to bottom, with first-year students colored pink, second-year students colored green, etc.; dark grey nodes correspond to students whose year is unknown). First-year students are visually separated from the others, while the layout becomes denser if students have been at university for a longer time. In B, it is also visible that first-year students show a higher tendency to mix with others they share residency with (dark grey: dorm unknown). For Caltech, the network out of the Facebook100 data set with the highest assortativity with respect to residence, nodes are visibly placed according to dorm membership (D, dark grey: year/dorm unknown), and less so with respect to year (C).
  • Figure 3: Leipzig Layout of the follower network of all German deputies that have a Twitter account (A). Members are colored according to their party and node size corresponds to overall node degree. Clear division between parties, as well as a stronger division between the right-wing party AfD and the other parties is visible. All parties except the Greens are arranged on a one-dimensional axis. This is explained by a difference in cross-party ties between politicians of the same party: The further out a member on the party-internal axis, the fewer cross-party ties to and from them have been established (except for the AfD, which does not receive many ties from other parties no matter where the users are placed) (C, colored according to parties, linear fits included). ForceAtlas2, in comparison, has a stronger separation of nodes within party clusters due to its rejecting force being proportional to $d^{-1}$ (B).
  • Figure 4: Retweet network of Twitter debate about a letter on free speech published by Harper's magazine (node size proportional to in degree). A two-camp division is visible, where the left pole includes the magazine as well as prominent signees, while the right pole contains critics.
  • Figure 5: visualisation of a survey on six different energy-generating technologies. The distribution of respondents is plotted as a density in the background (the lighter, the denser they are distributed in an area). Respondents are distributed close to gas, renewable energy-generating technologies, and between them. Two technological axes are visible: One from coal and gas to the renewables, and one among technologies using renewable sources of energy, with onshore and solar occupying central positions, while offshore and biomass are located opposite of each other.
  • ...and 7 more figures