Table of Contents
Fetching ...

Network Sampling: An Overview and Comparative Analysis

Quoc Chuong Nguyen

TL;DR

The paper investigates how well different network sampling methods preserve structural properties in static versus temporal networks. By comparing node-based, edge-based, and exploration-based approaches on a static CA-HepTh collaboration network and a temporal CollegeMsg network, it shows that no single method consistently preserves metrics across contexts. Advanced strategies perform best on static graphs, while simpler methods can outperform them in temporal settings, underscoring the need for context-aware, metric-driven sampling choices. The findings offer practical guidance for researchers selecting sampling methods tailored to network type and analytical goals, and point to future work on adaptive sampling for evolving systems, broader datasets, and metric-specific strategies. $

Abstract

Network sampling is a crucial technique for analyzing large or partially observable networks. However, the effectiveness of different sampling methods can vary significantly depending on the context. In this study, we empirically compare representative methods from three main categories: node-based, edge-based, and exploration-based sampling. We used two real-world datasets for our analysis: a scientific collaboration network and a temporal message-sending network. Our results indicate that no single sampling method consistently outperforms the others in both datasets. Although advanced methods tend to provide better accuracy on static networks, they often perform poorly on temporal networks, where simpler techniques can be more effective. These findings suggest that the best sampling strategy depends not only on the structural characteristics of the network but also on the specific metrics that need to be preserved or analyzed. Our work offers practical insights for researchers in choosing sampling approaches that are tailored to different types of networks and analytical objectives.

Network Sampling: An Overview and Comparative Analysis

TL;DR

The paper investigates how well different network sampling methods preserve structural properties in static versus temporal networks. By comparing node-based, edge-based, and exploration-based approaches on a static CA-HepTh collaboration network and a temporal CollegeMsg network, it shows that no single method consistently preserves metrics across contexts. Advanced strategies perform best on static graphs, while simpler methods can outperform them in temporal settings, underscoring the need for context-aware, metric-driven sampling choices. The findings offer practical guidance for researchers selecting sampling methods tailored to network type and analytical goals, and point to future work on adaptive sampling for evolving systems, broader datasets, and metric-specific strategies. $

Abstract

Network sampling is a crucial technique for analyzing large or partially observable networks. However, the effectiveness of different sampling methods can vary significantly depending on the context. In this study, we empirically compare representative methods from three main categories: node-based, edge-based, and exploration-based sampling. We used two real-world datasets for our analysis: a scientific collaboration network and a temporal message-sending network. Our results indicate that no single sampling method consistently outperforms the others in both datasets. Although advanced methods tend to provide better accuracy on static networks, they often perform poorly on temporal networks, where simpler techniques can be more effective. These findings suggest that the best sampling strategy depends not only on the structural characteristics of the network but also on the specific metrics that need to be preserved or analyzed. Our work offers practical insights for researchers in choosing sampling approaches that are tailored to different types of networks and analytical objectives.

Paper Structure

This paper contains 10 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Network sampling methodologies can be categorized into three primary approaches: node-based, edge-based, and exploration-based sampling. Node-based methods select nodes either uniformly or based on specific attributes. These methods often preserve properties at the node level. Edge-based techniques sample edges directly, which can better maintain connectivity patterns within the network. However, these methods may distort the distribution of nodes. Exploration-based methods, such as random walks or snowball sampling, traverse the network dynamically. These approaches are especially suited for large-scale or partially observable networks.
  • Figure 2: The collaboration network, after the removal of loops (self-edges), contains 9,877 nodes and 25973 edges, with a clustering coefficient of 0.4714. There is one significantly large component, approximately 0.875 in size, which includes 8,638 nodes. The average shortest path length within this component is 5.95, indicating a small-world effect. Additionally, the degree distribution follows a power law, which is characteristic of a scale-free network.
  • Figure 3: In the temporal network for this study, the number of edges decreases over time, suggesting that the relationships are short-term, as individuals are no longer sending messages to each other. The subnetwork consists of 116 nodes, which remain constant (this is a multiplex network). The original network contains 1,899 nodes and 59835 temporal edges, while the static projected graph has 20296 edges. For simplicity, we transform this directed network to an undirected one.
  • Figure 4: Performance comparison of six sampling methods on a static network across key structural metrics over the CA-HepTh network. The black dashed line represents the values from the original full network. As sample size increases, most methods show convergence toward the true metrics, though the rate and accuracy of convergence vary. Node-based methods effectively approximate node-level properties such as average degree, while edge-based methods better preserve global structures like the largest component size. Exploration-based methods capture connectivity patterns but exhibit bias toward high-centrality nodes. Uniform sampling performs poorly due to its mismatch with the scale-free nature of real-world networks.
  • Figure 5: These boxplots compare the performance of eight different network sampling methods on the CA-HepTh network. Each method was evaluated using 100 samples, each consisting of 1,000 nodes. Dashed black lines in the plots represent the corresponding values from the original full network. Results indicate that methods such as RWS, SS, and MHRWS maintain higher fidelity across most metrics, closely approximating the original network structure. In contrast, methods like UNS and UES demonstrate significant deviations from the original structure.
  • ...and 2 more figures