Where Did the President Visit Last Week? Detecting Celebrity Trips from News Articles

Kai Peng; Ying Zhang; Shuai Ling; Zhaoru Ke; Haipeng Zhang

Where Did the President Visit Last Week? Detecting Celebrity Trips from News Articles

Kai Peng, Ying Zhang, Shuai Ling, Zhaoru Ke, Haipeng Zhang

TL;DR

This paper tackles the challenge of detecting celebrity itineraries from news articles, a problem hindered by noisy long texts, cross-article dispersion, and implicit trip mentions. It introduces CeleTrip, a graph-based framework that builds Word-Article graphs for candidate locations, constructs entity sub-graphs from Wikidata via OpenKE, and learns event embeddings from contemporaneous news, all fused in a Trip Graph with Oriented Pooling to focus on the target celebrity. The authors release a large Trip Dataset (5,687 trips across 50 celebrities) and demonstrate that CeleTrip achieves an F1 score of 82.53%, outperforming strong baselines andgeneralizing to unseen celebrities; ablations confirm the importance of external knowledge and the specialized pooling mechanism. The work advances large-scale, network-wise analysis of celebrity mobility and offers practical tools for time/location extraction and graph-based itinerary detection with potential geopolitical and cultural insights.

Abstract

Celebrities' whereabouts are of pervasive importance. For instance, where politicians go, how often they visit, and who they meet, come with profound geopolitical and economic implications. Although news articles contain travel information of celebrities, it is not possible to perform large-scale and network-wise analysis due to the lack of automatic itinerary detection tools. To design such tools, we have to overcome difficulties from the heterogeneity among news articles: 1)One single article can be noisy, with irrelevant people and locations, especially when the articles are long. 2)Though it may be helpful if we consider multiple articles together to determine a particular trip, the key semantics are still scattered across different articles intertwined with various noises, making it hard to aggregate them effectively. 3)Over 20% of the articles refer to the celebrities' trips indirectly, instead of using the exact celebrity names or location names, leading to large portions of trips escaping regular detecting algorithms. We model text content across articles related to each candidate location as a graph to better associate essential information and cancel out the noises. Besides, we design a special pooling layer based on attention mechanism and node similarity, reducing irrelevant information from longer articles. To make up the missing information resulted from indirect mentions, we construct knowledge sub-graphs for named entities (person, organization, facility, etc.). Specifically, we dynamically update embeddings of event entities like the G7 summit from news descriptions since the properties (date and location) of the event change each time, which is not captured by the pre-trained event representations. The proposed CeleTrip jointly trains these modules, which outperforms all baseline models and achieves 82.53% in the F1 metric.

Where Did the President Visit Last Week? Detecting Celebrity Trips from News Articles

TL;DR

Abstract

Paper Structure (42 sections, 13 equations, 5 figures, 6 tables)

This paper contains 42 sections, 13 equations, 5 figures, 6 tables.

Introduction
Problem Statement
Method
Location Embedding Learning
Word-Article Graph
Oriented Pooling
Entity Embedding Learning
Event Embedding Learning
Trip Graph Learning
Trip graph construction
Trip graph aggregation
Extraction Tools for Dates and Locations
Experiments
Data Collection
Data Processing and Labeling
...and 27 more sections

Figures (5)

Figure 1: Example of extracting trip information of Donald Trump from news articles on 2017-01-26.
Figure 2: Our framework learns the overall representations of candidate locations and classifies them in Trip Graph, where the textual description of each candidate location is incorporated by the Location Embedding Learning, the knowledge of related entities is supplied by Entity Embedding Learning, and the information of related events is obtained by Event Embedding Learning.
Figure 3: Sensitivity analysis of different parameters. The horizontal coordinates indicate the values of parameters, and the vertical coordinates indicate the value of corresponding metrics.
Figure 4: Top 5 and bottom 5 sentences ranked by attention values from the event embedding learning module.
Figure 5: Visualization of the learned representations for celebrity entities, through t-SNE.

Where Did the President Visit Last Week? Detecting Celebrity Trips from News Articles

TL;DR

Abstract

Where Did the President Visit Last Week? Detecting Celebrity Trips from News Articles

Authors

TL;DR

Abstract

Table of Contents

Figures (5)