When Life Paths Cross: Extracting Human Interactions in Time and Space from Wikipedia
Zhongyang Liu, Ying Zhang, Xiangyi Xiao, Wenting Liu, Yuanting Zha, Haipeng Zhang
TL;DR
The paper tackles the lack of large-scale spatio-temporal interaction data by mining Wikipedia biographies to extract 685,966 interaction quadruples and constructing the WikiInteraction dataset. It introduces FALCON, an AR-BERT-based framework with multi-task learning and feature transfer to jointly extract interactions and verify trajectory co-occurrence. Empirical results show FALCON achieves a top F1 of 86.51% on WikiInteraction and generalizes to Encyclopedia Britannica, enabling large-scale analysis such as US political polarization through inter- and intra-party interactions. The work provides open-source code, the annotated WikiInteraction data, and a scalable approach for longitudinal, location-aware social interaction studies with broad applicability to historical and cultural analytics.
Abstract
Interactions among notable individuals -- whether examined individually, in groups, or as networks -- often convey significant messages across cultural, economic, political, scientific, and historical perspectives. By analyzing the times and locations of these interactions, we can observe how dynamics unfold across regions over time. However, relevant studies are often constrained by data scarcity, particularly concerning the availability of specific location and time information. To address this issue, we mine millions of biography pages from Wikipedia, extracting 685,966 interaction records in the form of (Person1, Person2, Time, Location) interaction quadruplets. The key elements of these interactions are often scattered throughout the heterogeneous crowd-sourced text and may be loosely or indirectly associated. We overcome this challenge by designing a model that integrates attention mechanisms, multi-task learning, and feature transfer methods, achieving an F1 score of 86.51%, which outperforms baseline models. We further conduct an empirical analysis of intra- and inter-party interactions among political figures to examine political polarization in the US, showcasing the potential of the extracted data from a perspective that may not be possible without this data. We make our code, the extracted interaction data, and the WikiInteraction dataset of 4,507 labeled interaction quadruplets publicly available.
