Table of Contents
Fetching ...

When Life Paths Cross: Extracting Human Interactions in Time and Space from Wikipedia

Zhongyang Liu, Ying Zhang, Xiangyi Xiao, Wenting Liu, Yuanting Zha, Haipeng Zhang

TL;DR

The paper tackles the lack of large-scale spatio-temporal interaction data by mining Wikipedia biographies to extract 685,966 interaction quadruples and constructing the WikiInteraction dataset. It introduces FALCON, an AR-BERT-based framework with multi-task learning and feature transfer to jointly extract interactions and verify trajectory co-occurrence. Empirical results show FALCON achieves a top F1 of 86.51% on WikiInteraction and generalizes to Encyclopedia Britannica, enabling large-scale analysis such as US political polarization through inter- and intra-party interactions. The work provides open-source code, the annotated WikiInteraction data, and a scalable approach for longitudinal, location-aware social interaction studies with broad applicability to historical and cultural analytics.

Abstract

Interactions among notable individuals -- whether examined individually, in groups, or as networks -- often convey significant messages across cultural, economic, political, scientific, and historical perspectives. By analyzing the times and locations of these interactions, we can observe how dynamics unfold across regions over time. However, relevant studies are often constrained by data scarcity, particularly concerning the availability of specific location and time information. To address this issue, we mine millions of biography pages from Wikipedia, extracting 685,966 interaction records in the form of (Person1, Person2, Time, Location) interaction quadruplets. The key elements of these interactions are often scattered throughout the heterogeneous crowd-sourced text and may be loosely or indirectly associated. We overcome this challenge by designing a model that integrates attention mechanisms, multi-task learning, and feature transfer methods, achieving an F1 score of 86.51%, which outperforms baseline models. We further conduct an empirical analysis of intra- and inter-party interactions among political figures to examine political polarization in the US, showcasing the potential of the extracted data from a perspective that may not be possible without this data. We make our code, the extracted interaction data, and the WikiInteraction dataset of 4,507 labeled interaction quadruplets publicly available.

When Life Paths Cross: Extracting Human Interactions in Time and Space from Wikipedia

TL;DR

The paper tackles the lack of large-scale spatio-temporal interaction data by mining Wikipedia biographies to extract 685,966 interaction quadruples and constructing the WikiInteraction dataset. It introduces FALCON, an AR-BERT-based framework with multi-task learning and feature transfer to jointly extract interactions and verify trajectory co-occurrence. Empirical results show FALCON achieves a top F1 of 86.51% on WikiInteraction and generalizes to Encyclopedia Britannica, enabling large-scale analysis such as US political polarization through inter- and intra-party interactions. The work provides open-source code, the annotated WikiInteraction data, and a scalable approach for longitudinal, location-aware social interaction studies with broad applicability to historical and cultural analytics.

Abstract

Interactions among notable individuals -- whether examined individually, in groups, or as networks -- often convey significant messages across cultural, economic, political, scientific, and historical perspectives. By analyzing the times and locations of these interactions, we can observe how dynamics unfold across regions over time. However, relevant studies are often constrained by data scarcity, particularly concerning the availability of specific location and time information. To address this issue, we mine millions of biography pages from Wikipedia, extracting 685,966 interaction records in the form of (Person1, Person2, Time, Location) interaction quadruplets. The key elements of these interactions are often scattered throughout the heterogeneous crowd-sourced text and may be loosely or indirectly associated. We overcome this challenge by designing a model that integrates attention mechanisms, multi-task learning, and feature transfer methods, achieving an F1 score of 86.51%, which outperforms baseline models. We further conduct an empirical analysis of intra- and inter-party interactions among political figures to examine political polarization in the US, showcasing the potential of the extracted data from a perspective that may not be possible without this data. We make our code, the extracted interaction data, and the WikiInteraction dataset of 4,507 labeled interaction quadruplets publicly available.

Paper Structure

This paper contains 48 sections, 14 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: Example of extracted quadruples and their contexts. (a) An example of correct spatio-temporal interaction. (b) An example of incorrect spatio-temporal interaction.
  • Figure 2: The framework of our method.
  • Figure 3: The architecture of AR-BERT.
  • Figure 4: The evolution in the ratios of different types of inter-party interactions.
  • Figure 5: US political interaction network (1960-2024) with nodes colored by party (red=Republican, blue=Democrat) and edges weighted by interaction type: Neutral (1), Cooperative (2), Adversarial (-2).
  • ...and 8 more figures