Collecting Influencers: A Comparative Study of Online Network Crawlers

Mikhail Drobyshevskiy; Denis Aivazov; Denis Turdakov; Alexander Yatskov; Maksim Varlamov; Danil Shayhelislamov

Collecting Influencers: A Comparative Study of Online Network Crawlers

Mikhail Drobyshevskiy, Denis Aivazov, Denis Turdakov, Alexander Yatskov, Maksim Varlamov, Danil Shayhelislamov

TL;DR

Six known crawlers are compared on the task of collecting the fraction of the most influential nodes of graph and it is confirmed that greedy methods perform the best in many settings, but the cases exist when they are very inefficient.

Abstract

Online network crawling tasks require a lot of efforts for the researchers to collect the data. One of them is identification of important nodes, which has many applications starting from viral marketing to the prevention of disease spread. Various crawling algorithms has been suggested but their efficiency is not studied well. In this paper we compared six known crawlers on the task of collecting the fraction of the most influential nodes of graph. We analyzed crawlers behavior for four measures of node influence: node degree, k-coreness, betweenness centrality, and eccentricity. The experiments confirmed that greedy methods perform the best in many settings, but the cases exist when they are very inefficient.

Collecting Influencers: A Comparative Study of Online Network Crawlers

TL;DR

Abstract

Paper Structure (15 sections, 5 figures, 2 tables)

This paper contains 15 sections, 5 figures, 2 tables.

Introduction
Problem definition and methodology
Problem Definition
Crawlers
Dataset
Method
Experiments
Nodes coverage
Seed choice influence
Influential nodes coverage
Degree and k-coreness
Eccentricity
Aggregated results
Related work
Conclusion

Figures (5)

Figure 1: Top: node coverage $c^{nodes}=|V'| / |V|$. Bottom: same results, the gap between current method and the best result at each point (i.e. the higher the better) for node coverage.
Figure 2: DCAM graph. Left: network structure. Right: node coverage with variation for several seeds. Dotted liens correspond to individual seeds, bold lines correspond to averaged values.
Figure 3: Slashdot graph. Left: Venn diagram for top centrality nodes sets. Right: node coverage with variation for several seeds. Dotted liens correspond to individual seeds, bold lines correspond to averaged values.
Figure 4: Target set coverage $c^{target}_c=|V'_c \cap V^*| / |V^*|$ depending on the fraction of nodes crawled. The lower plot shows the gap between current method and the best result at each point (i.e. the higher the better).
Figure 5: AUC values computed for node coverage $c^{nodes}$ and target set coverage $c^{target}_c=|V'_c \cap V^*| / |V^*|$. Right: winners aggregated over the graphs. Colored bars' heights denote how many times this crawler was the best one at specific measures.

Collecting Influencers: A Comparative Study of Online Network Crawlers

TL;DR

Abstract

Collecting Influencers: A Comparative Study of Online Network Crawlers

Authors

TL;DR

Abstract

Table of Contents

Figures (5)