Table of Contents
Fetching ...

Constructing and Analyzing Different Density Graphs for Path Extrapolation in Wikipedia

Martha Sotiroudi, Anastasia-Sotiria Toufa, Constantine Kotropoulos

TL;DR

The paper investigates path extrapolation on Wikipedia graphs by introducing the Wikipedia Central Macedonia (WCM) dataset, built via crawling from a Central Macedonia seed article. It advances GRETEL with Dual Hypergraph Transformation (DHT) and novel dual-hypergraph features to better capture complex interactions, and evaluates performance on dense and sparse WCM graphs as well as Wikispeedia. Key findings show that hypergraph features improve accuracy, with Dual GRETEL excelling in dense graphs while sparse graphs benefit from reduced noise, and that the dense WCM graph can outperform Wikispeedia in top-$5$ precision despite its smaller size. These results highlight the importance of graph structure and feature extraction in path extrapolation, suggesting practical implications for navigation prediction in large knowledge graphs. A publicly available WCM dataset further enables reproducibility and future work in density-aware graph-based path modeling.$

Abstract

Graph-based models have become pivotal in understanding and predicting navigational patterns within complex networks. Building on graph-based models, the paper advances path extrapolation methods to efficiently predict Wikipedia navigation paths. The Wikipedia Central Macedonia (WCM) dataset is sourced from Wikipedia, with a spotlight on the Central Macedonia region, Greece, to initiate path generation. To build WCM, a crawling process is used that simulates human navigation through Wikipedia. Experimentation shows that an extension of the graph neural network GRETEL, which resorts to dual hypergraph transformation, performs better on a dense graph of WCM than on a sparse graph of WCM. Moreover, combining hypergraph features with features extracted from graph edges has proven to enhance the model's effectiveness. A superior model's performance is reported on the WCM dense graph than on the larger Wikispeedia dataset, suggesting that size may not be as influential in predictive accuracy as the quality of connections and feature extraction. The paper fits the track Knowledge Discovery and Machine Learning of the 16th International Conference on Advances in Databases, Knowledge, and Data Applications.

Constructing and Analyzing Different Density Graphs for Path Extrapolation in Wikipedia

TL;DR

The paper investigates path extrapolation on Wikipedia graphs by introducing the Wikipedia Central Macedonia (WCM) dataset, built via crawling from a Central Macedonia seed article. It advances GRETEL with Dual Hypergraph Transformation (DHT) and novel dual-hypergraph features to better capture complex interactions, and evaluates performance on dense and sparse WCM graphs as well as Wikispeedia. Key findings show that hypergraph features improve accuracy, with Dual GRETEL excelling in dense graphs while sparse graphs benefit from reduced noise, and that the dense WCM graph can outperform Wikispeedia in top- precision despite its smaller size. These results highlight the importance of graph structure and feature extraction in path extrapolation, suggesting practical implications for navigation prediction in large knowledge graphs. A publicly available WCM dataset further enables reproducibility and future work in density-aware graph-based path modeling.$

Abstract

Graph-based models have become pivotal in understanding and predicting navigational patterns within complex networks. Building on graph-based models, the paper advances path extrapolation methods to efficiently predict Wikipedia navigation paths. The Wikipedia Central Macedonia (WCM) dataset is sourced from Wikipedia, with a spotlight on the Central Macedonia region, Greece, to initiate path generation. To build WCM, a crawling process is used that simulates human navigation through Wikipedia. Experimentation shows that an extension of the graph neural network GRETEL, which resorts to dual hypergraph transformation, performs better on a dense graph of WCM than on a sparse graph of WCM. Moreover, combining hypergraph features with features extracted from graph edges has proven to enhance the model's effectiveness. A superior model's performance is reported on the WCM dense graph than on the larger Wikispeedia dataset, suggesting that size may not be as influential in predictive accuracy as the quality of connections and feature extraction. The paper fits the track Knowledge Discovery and Machine Learning of the 16th International Conference on Advances in Databases, Knowledge, and Data Applications.

Paper Structure

This paper contains 12 sections, 6 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Dense Wikipedia Graph.
  • Figure 2: Sparse Wikipedia Graph.
  • Figure 3: Wikispeedia Graph.