Table of Contents
Fetching ...

Higher-Order Network Structure Inference: A Topological Approach to Network Selection

Adam Schroeder, Russell Funk, Jingyi Guan, Taylor Okonek, Lori Ziegelmeier

TL;DR

The paper tackles the challenge of selecting robust threshold parameters for complex networks by incorporating higher-order topology through persistent homology. It introduces a pipeline that maps parameter choices to networks, converts topological features into persistence images, and optimizes thresholds via a tangent-space stability measure under user-defined topological constraints, with the parameter space $U$ and hyperparameters $\delta_k$. Applied to concept networks from Dimensions AI in applied mathematics, the method yields thresholded networks with stable topological structure and backbone-like sparsity, while allowing domain-specific constraints. The authors connect the optimization to maximum-likelihood principles and propose a higher-order variance framework to explain and validate the observed stability, while noting computational costs and opportunities for generalization to other parameterization problems.

Abstract

Thresholding--the pruning of nodes or edges based on their properties or weights--is an essential preprocessing tool for extracting interpretable structure from complex network data, yet existing methods face several key limitations. Threshold selection often relies on heuristic methods or trial and error due to large parameter spaces and unclear optimization criteria, leading to sensitivity where small parameter variations produce significant changes in network structure. Moreover, most approaches focus on pairwise relationships between nodes, overlooking critical higher-order interactions involving three or more nodes. We introduce a systematic thresholding algorithm that leverages topological data analysis to identify optimal network parameters by accounting for higher-order structural relationships. Our method uses persistent homology to compute the stability of homological features across the parameter space, identifying parameter choices that are robust to small variations while preserving meaningful topological structure. Hyperparameters allow users to specify minimum requirements for topological features, effectively constraining the parameter search to avoid spurious solutions. We demonstrate the approach with an application in the Science of Science, where networks of scientific concepts are extracted from research paper abstracts, and concepts are connected when they co-appear in the same abstract. The flexibility of our approach allows researchers to incorporate domain-specific constraints and extends beyond network thresholding to general parameterization problems in data analysis.

Higher-Order Network Structure Inference: A Topological Approach to Network Selection

TL;DR

The paper tackles the challenge of selecting robust threshold parameters for complex networks by incorporating higher-order topology through persistent homology. It introduces a pipeline that maps parameter choices to networks, converts topological features into persistence images, and optimizes thresholds via a tangent-space stability measure under user-defined topological constraints, with the parameter space and hyperparameters . Applied to concept networks from Dimensions AI in applied mathematics, the method yields thresholded networks with stable topological structure and backbone-like sparsity, while allowing domain-specific constraints. The authors connect the optimization to maximum-likelihood principles and propose a higher-order variance framework to explain and validate the observed stability, while noting computational costs and opportunities for generalization to other parameterization problems.

Abstract

Thresholding--the pruning of nodes or edges based on their properties or weights--is an essential preprocessing tool for extracting interpretable structure from complex network data, yet existing methods face several key limitations. Threshold selection often relies on heuristic methods or trial and error due to large parameter spaces and unclear optimization criteria, leading to sensitivity where small parameter variations produce significant changes in network structure. Moreover, most approaches focus on pairwise relationships between nodes, overlooking critical higher-order interactions involving three or more nodes. We introduce a systematic thresholding algorithm that leverages topological data analysis to identify optimal network parameters by accounting for higher-order structural relationships. Our method uses persistent homology to compute the stability of homological features across the parameter space, identifying parameter choices that are robust to small variations while preserving meaningful topological structure. Hyperparameters allow users to specify minimum requirements for topological features, effectively constraining the parameter search to avoid spurious solutions. We demonstrate the approach with an application in the Science of Science, where networks of scientific concepts are extracted from research paper abstracts, and concepts are connected when they co-appear in the same abstract. The flexibility of our approach allows researchers to incorporate domain-specific constraints and extends beyond network thresholding to general parameterization problems in data analysis.

Paper Structure

This paper contains 18 sections, 18 equations, 17 figures.

Figures (17)

  • Figure 1: A toy example depicting a filtration of a dynamic network (top row) and its resulting persistence diagrams for dimensions $k=0,1,2$ (bottom row). The filtration parameter is $t$. The topological features appear as coordinates in the plot, with navy blue indicating dimension zero features (connected components), red indicating dimension one features (cycles), and green indicating dimension two features (trapped volumes).
  • Figure 2: A toy concept network on four concepts, labeled $A$, $B$, $C$, and $D$. These concepts are joined through co-appearance in the abstracts of papers $a$, $b$, and $c$.
  • Figure 3: Histogram of concept frequency $\tau(v)$ for applied mathematics (ANZSRC code 0102) from Dimensions AI (see Section \ref{['subsec:data_generation']} for details on the data). The $y$-axis (count of concepts per bin) uses a logarithmic scale for legibility. While critical threshold regions can be roughly identified by eye---for instance, natural lower and upper bounds might appear to occur at $\tau(v)=10$ and $150{,}000$ respectively---small variations in these parameters can produce substantially different networks. A network constructed with lower bound $\ell=10$ may differ considerably from one with $\ell=5$, illustrating the sensitivity of network structure to threshold selection and the need for a principled approach to parameter choice.
  • Figure 4: Illustration of how the concatenated persistence image vectors trace out the latent manifold; in particular, here only the parameter associated with index $i$ is varied, and so the resulting variation in $\mathbb{R}^n$ is only one-dimensional. The magnitudes of the difference vectors $\rho_{i,j} - \rho_{i-1,j}$ and $\rho_{i+1,j} - \rho_{i,j}$ are used when computing the tangent space.
  • Figure 5: Visualization of the algorithm's pipeline, excluding the optimization step. Variations on the paramter or threshold domain $U$ result in different networks in the feature space $\mathcal{X}$, which are then transformed via the process $\mathcal{H}$ to $P$, multisets called persistence diagrams via persistent homology. Each network has an associated representation in $\mathbb{R}^n$ by the process of persistence images, $\rho$, and the tangent space $\nabla \rho$ allows us to study each representation on the lower-dimensional latent space. In this illustration, the local tangent space is a plane, but in general, it will be a hypersurface whose dimensionality is equal to the original parameter domain.
  • ...and 12 more figures