Table of Contents
Fetching ...

Efficient quantification on large-scale networks

Alessio Micheli, Alejandro Moreo, Marco Podda, Fabrizio Sebastiani, William Simoni, Domenico Tortorella

TL;DR

This work tackles network quantification under prior probability shift by introducing XNQ, a three-component framework that combines unsupervised reservoir-based node embeddings (GESN), a calibrated intermediate readout classifier, and a quantification-focused EM-based aggregator (SLD). XNQ achieves state-of-the-art performance across large-scale, heterophilic graphs and remains efficient enough to scale to hundreds of thousands of nodes, outperforming existing NQ methods by substantial margins. Through extensive ablations, the study demonstrates the critical roles of the embedding, calibration, and EM-based quantifier, and confirms the method’s effectiveness in both binary and multi-class settings. The approach shows promise for practical deployments in social networks, political science, and epidemiology, where accurate prevalence estimation over subpopulations is essential.

Abstract

Network quantification (NQ) is the problem of estimating the proportions of nodes belonging to each class in subsets of unlabelled graph nodes. When prior probability shift is at play, this task cannot be effectively addressed by first classifying the nodes and then counting the class predictions. In addition, unlike non-relational quantification, NQ demands enhanced flexibility in order to capture a broad range of connectivity patterns, resilience to the challenge of heterophily, and scalability to large networks. In order to meet these stringent requirements, we introduce XNQ, a novel method that synergizes the flexibility and efficiency of the unsupervised node embeddings computed by randomized recursive Graph Neural Networks, with an Expectation-Maximization algorithm that provides a robust quantification-aware adjustment to the output probabilities of a calibrated node classifier. In an extensive evaluation, in which we also validate the design choices underpinning XNQ through comprehensive ablation experiments, we find that XNQ consistently and significantly improves on the best network quantification methods to date, thereby setting the new state of the art for this challenging task. XNQ also provides a training speed-up of up to 10x-100x over other methods based on graph learning.

Efficient quantification on large-scale networks

TL;DR

This work tackles network quantification under prior probability shift by introducing XNQ, a three-component framework that combines unsupervised reservoir-based node embeddings (GESN), a calibrated intermediate readout classifier, and a quantification-focused EM-based aggregator (SLD). XNQ achieves state-of-the-art performance across large-scale, heterophilic graphs and remains efficient enough to scale to hundreds of thousands of nodes, outperforming existing NQ methods by substantial margins. Through extensive ablations, the study demonstrates the critical roles of the embedding, calibration, and EM-based quantifier, and confirms the method’s effectiveness in both binary and multi-class settings. The approach shows promise for practical deployments in social networks, political science, and epidemiology, where accurate prevalence estimation over subpopulations is essential.

Abstract

Network quantification (NQ) is the problem of estimating the proportions of nodes belonging to each class in subsets of unlabelled graph nodes. When prior probability shift is at play, this task cannot be effectively addressed by first classifying the nodes and then counting the class predictions. In addition, unlike non-relational quantification, NQ demands enhanced flexibility in order to capture a broad range of connectivity patterns, resilience to the challenge of heterophily, and scalability to large networks. In order to meet these stringent requirements, we introduce XNQ, a novel method that synergizes the flexibility and efficiency of the unsupervised node embeddings computed by randomized recursive Graph Neural Networks, with an Expectation-Maximization algorithm that provides a robust quantification-aware adjustment to the output probabilities of a calibrated node classifier. In an extensive evaluation, in which we also validate the design choices underpinning XNQ through comprehensive ablation experiments, we find that XNQ consistently and significantly improves on the best network quantification methods to date, thereby setting the new state of the art for this challenging task. XNQ also provides a training speed-up of up to 10x-100x over other methods based on graph learning.

Paper Structure

This paper contains 33 sections, 10 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Node classification
  • Figure 2: Network quantification (this work)
  • Figure 4: In the binary case, diagonal plots provide a visual tool to compare quantifiers. Here, $q_2$ is closer to the ideal quantifier behaviour (dashed diagonal), and thus superior to $q_1$.
  • Figure 5: XNQ is composed of three modules applied sequentially.
  • Figure 6: Diagonal plots compare XNQ against other NQ baselines for different test class prevalence values generated via the APP. Shaded bands represent standard deviations.
  • ...and 4 more figures