Table of Contents
Fetching ...

QuickCent: a fast and frugal heuristic for harmonic centrality estimation on scale-free networks

Francisco Plana, Andrés Abeliuk, Jorge Pérez

TL;DR

QuickCent presents a fast, frugal heuristic to approximate harmonic centrality in large scale-free networks by regressing a centrality measure on a sequence of in-degree-based binary clues. It relies on two key assumptions—the monotonic relationship between in-degree and harmonic centrality and a power-law distribution of centrality—to compute concise summary statistics via quantile-based clues, enabling accurate, low-variance estimates even with limited training data. Empirical and synthetic experiments show QuickCent is competitive with standard ML methods in accuracy while offering favorable variance and time characteristics, particularly in networks generated by preferential attachment. The work highlights the practical potential of ecologically rational heuristics for network measure estimation and outlines directions for extending the approach to other centralities and network models.

Abstract

We present a simple and quick method to approximate network centrality indexes. Our approach, called QuickCent, is inspired by so-called fast and frugal heuristics, which are heuristics initially proposed to model some human decision and inference processes. The centrality index that we estimate is the harmonic centrality, which is a measure based on shortest-path distances, so infeasible to compute on large networks. We compare QuickCent with known machine learning algorithms on synthetic data generated with preferential attachment, and some empirical networks. Our experiments show that QuickCent is able to make estimates that are competitive in accuracy with the best alternative methods tested, either on synthetic scale-free networks or empirical networks. QuickCent has the feature of achieving low error variance estimates, even with a small training set. Moreover, QuickCent is comparable in efficiency -- accuracy and time cost -- to those produced by more complex methods. We discuss and provide some insight into how QuickCent exploits the fact that in some networks, such as those generated by preferential attachment, local density measures such as the in-degree, can be a proxy for the size of the network region to which a node has access, opening up the possibility of approximating centrality indices based on size such as the harmonic centrality. Our initial results show that simple heuristics and biologically inspired computational methods are a promising line of research in the context of network measure estimations.

QuickCent: a fast and frugal heuristic for harmonic centrality estimation on scale-free networks

TL;DR

QuickCent presents a fast, frugal heuristic to approximate harmonic centrality in large scale-free networks by regressing a centrality measure on a sequence of in-degree-based binary clues. It relies on two key assumptions—the monotonic relationship between in-degree and harmonic centrality and a power-law distribution of centrality—to compute concise summary statistics via quantile-based clues, enabling accurate, low-variance estimates even with limited training data. Empirical and synthetic experiments show QuickCent is competitive with standard ML methods in accuracy while offering favorable variance and time characteristics, particularly in networks generated by preferential attachment. The work highlights the practical potential of ecologically rational heuristics for network measure estimation and outlines directions for extending the approach to other centralities and network models.

Abstract

We present a simple and quick method to approximate network centrality indexes. Our approach, called QuickCent, is inspired by so-called fast and frugal heuristics, which are heuristics initially proposed to model some human decision and inference processes. The centrality index that we estimate is the harmonic centrality, which is a measure based on shortest-path distances, so infeasible to compute on large networks. We compare QuickCent with known machine learning algorithms on synthetic data generated with preferential attachment, and some empirical networks. Our experiments show that QuickCent is able to make estimates that are competitive in accuracy with the best alternative methods tested, either on synthetic scale-free networks or empirical networks. QuickCent has the feature of achieving low error variance estimates, even with a small training set. Moreover, QuickCent is comparable in efficiency -- accuracy and time cost -- to those produced by more complex methods. We discuss and provide some insight into how QuickCent exploits the fact that in some networks, such as those generated by preferential attachment, local density measures such as the in-degree, can be a proxy for the size of the network region to which a node has access, opening up the possibility of approximating centrality indices based on size such as the harmonic centrality. Our initial results show that simple heuristics and biologically inspired computational methods are a promising line of research in the context of network measure estimations.
Paper Structure (31 sections, 19 equations, 11 figures, 11 tables)

This paper contains 31 sections, 19 equations, 11 figures, 11 tables.

Figures (11)

  • Figure 1: A network randomly generated with linear preferential attachment.
  • Figure 2: Benchmark with other ML methods for different exponents of PA digraph instances and 10 $\%$ of training size. For each regression method, there is a boxplot showing the MAE distribution. Each boxplot goes from the $25-$th percentile to the $75-$th percentile, with a length known as the inter-quartile range (IQR). The line inside the box indicates the median, and the rhombus indicates the mean. The whiskers start from the edge of the box and cover until the furthest point within $1.5$ times the IQR. Any data point beyond the whisker ends is considered an outlier, and it is drawn as a dot. For display purposes, the vertical limit of the plots has been set to $10$, since the highest MAE outliers of NN or L, depending on the PA exponent, blur the details of the model performance.
  • Figure 3: Benchmark with other ML methods for different exponents of PA digraph instances and 100 $\%$ of training size. For each regression method there is a boxplot showing the MAE distribution. Each boxplot goes from the $25-$th percentile to the $75-$th percentile, with a length known as the inter-quartile range (IQR). The line inside the box indicates the median, and the rhombus indicates the mean. The whiskers start from the edge of the box and cover until the furthest point within $1.5$ times the IQR. Any data point beyond the whisker ends is considered an outlier, and it is drawn as a dot. For display reasons, the vertical limit of the two first plots was set to $10$, since the highest MAE outliers of NN or L, depending on the PA exponent, blur the details of the model performance.
  • Figure 4: Effect of randomization on different ML methods using 30 $\%$ of the training size. Each boxplot group is labeled with the name of the ML method, a dot, and the type of network on which the estimates are made ('PL' for the initial PA network, 'RPL' for the network after randomization). QC8 corresponds to QuickCent with a proportion vector of length $8$, and analogously for QC1. For each regression method, there is a boxplot representing the MAE distribution. Each boxplot goes from the $25-$th percentile to the $75-$th percentile, with a length known as the inter-quartile range (IQR). The line inside the box indicates the median, and the rhombus indicates the mean. The whiskers start from the edge of the box and extend to the furthest point within $1.5$ times the IQR. Any data point beyond the whisker ends is considered an outlier, and it is drawn as a dot. For display reasons, the vertical limit of the plots was set at $10$, since the highest MAE outliers of NN, make blur the details of the model performance.
  • Figure 5: Effect of centrality distribution on different ML methods using 30 $\%$ of training size. Each boxplot group is labeled with the name of the ML method, a dot, and the type of network on which the estimates are made ('mb' for moreno_blogs, 'sj' for subelj_jung-j, 'ERmb' for the ER digraph created with the parameters of moreno_blogs, and analogously for 'ERsj'). The number after 'QC' is the length of the vector of proportions used by that method, corresponding to the best accuracy for the respective network. For each regression method, there is a boxplot representing the MAE distribution. Each boxplot goes from the $25-$th percentile to the $75-$th percentile, with a length known as the inter-quartile range (IQR). The line inside the box indicates the median, and the rhombus indicates the mean. The whiskers start from the edge of the box and extend to the furthest point within $1.5$ times the IQR. Any data point beyond the whisker ends is considered an outlier, and it is drawn as a dot. For display reasons, the vertical limit of the control network plot has been set at $150$, as the highest MAE outliers of NN blur the details of the model performance.
  • ...and 6 more figures

Theorems & Definitions (3)

  • Example 2.1
  • Example 3.1
  • Example 3.2