Table of Contents
Fetching ...

Quantifying metadata relevance to network block structure using description length

Lena Mangold, Camille Roth

TL;DR

The metablox tool is proposed to quantify the relationship between a network’s node metadata and its mesoscale structure, measuring the strength of the relationship and the type of structural arrangement exhibited by the metadata.

Abstract

Network analysis is often enriched by including an examination of node metadata. In the context of understanding the mesoscale of networks it is often assumed that node groups based on metadata and node groups based on connectivity patterns are intrinsically linked. This assumption is increasingly being challenged, whereby metadata might be entirely unrelated to structure or, similarly, multiple sets of metadata might be relevant to the structure of a network in different ways. We propose the metablox tool to quantify the relationship between a network's node metadata and its mesoscale structure, measuring the strength of the relationship and the type of structural arrangement exhibited by the metadata. We show on a number of synthetic and empirical networks that our tool distinguishes relevant metadata and allows for this in a comparative setting, demonstrating that it can be used as part of systematic meta analyses for the comparison of networks from different domains.

Quantifying metadata relevance to network block structure using description length

TL;DR

The metablox tool is proposed to quantify the relationship between a network’s node metadata and its mesoscale structure, measuring the strength of the relationship and the type of structural arrangement exhibited by the metadata.

Abstract

Network analysis is often enriched by including an examination of node metadata. In the context of understanding the mesoscale of networks it is often assumed that node groups based on metadata and node groups based on connectivity patterns are intrinsically linked. This assumption is increasingly being challenged, whereby metadata might be entirely unrelated to structure or, similarly, multiple sets of metadata might be relevant to the structure of a network in different ways. We propose the metablox tool to quantify the relationship between a network's node metadata and its mesoscale structure, measuring the strength of the relationship and the type of structural arrangement exhibited by the metadata. We show on a number of synthetic and empirical networks that our tool distinguishes relevant metadata and allows for this in a comparative setting, demonstrating that it can be used as part of systematic meta analyses for the comparison of networks from different domains.
Paper Structure (26 sections, 16 equations, 9 figures)

This paper contains 26 sections, 16 equations, 9 figures.

Figures (9)

  • Figure 1: Two partitions of a toy network. a Example network described in main text, with nodes coloured according to their block membership in the planted bicommunity partition. b The same example network, with nodes coloured according to their block membership in the planted core-periphery partition. Both visualisations have been drawn using the graph-tool library peixoto_graph-tool_2014, which is used for all network visualisations in this paper.
  • Figure 2: Schematic of a partition landscape. a Partition landscape peel_ground_2017peixoto_revealing_2021 for a toy network, with the negative description length on the vertical axis, for which we have highlighted the positions of the optimal inferred partition (blue/red), a second partition (blue/red), and a metadata partition $d$ (purple/green). b Approach of measuring partition similarity directly. c Proposed metablox approach, of measuring the distance of the network's description length under the metadata partition from the network's description length under the optimal partition.
  • Figure 3: Metablox and BESTest values for a synthetic network. Metablox values for the degree-corrected and planted partition SBM, and BESTest values peel_ground_2017 under the degree-corrected SBM, for a synthetic network with multiple sets of metadata. a Values for the sets of bicommunity-like metadata. b Values for the sets of core-periphery-like metadata. Both panels show an increasing correlation $\rho$ between metadata and block structure on the x-axis.
  • Figure 4: Metablox values for multiple metadata on law firm networks. Degree-corrected, non-degree-corrected and planted partition dimensions of the metablox vector for each of five sets of metadata (status, gender, office, type of law practised, and law school attended) for three networks of employees of a law firm lazega_collegial_2001. a Advice network. b Friendship network. c Coworking network. On each figure, the SBM variant that gives the lowest edge compression for the network is highlighted in red.
  • Figure 5: Metablox values for multiple Twitter/X networks. Degree-corrected, non-degree-corrected and planted partition dimensions of the metablox vector for various Twitter/X networks. a Static snapshots, representing a non-overlapping one year period each, of a Twitter/X retweet network among users discussing the topic of impact investing, with user location (country) as shared metadata (scenario II). b Three Twitter/X interaction networks garimella_political_2018hohmann_quantifying_2023 with shared metadata representing the users' political stance (liberal vs conservative) (scenario III). The SBM variant that gives the lowest edge compression for each network is highlighted in red.
  • ...and 4 more figures