Table of Contents
Fetching ...

The X Types -- Mapping the Semantics of the Twitter Sphere

Ogen Schlachet Drukerman, Einat Minkov

TL;DR

This work considers a social KB of roughly 200K popular Twitter accounts, which denotes entities of interest, and generates semantic embeddings of social entities generated in this work, demonstrating enhanced performance on the key task of entity similarity assessment using this information.

Abstract

Social networks form a valuable source of world knowledge, where influential entities correspond to popular accounts. Unlike factual knowledge bases (KBs), which maintain a semantic ontology, structured semantic information is not available on social media. In this work, we consider a social KB of roughly 200K popular Twitter accounts, which denotes entities of interest. We elicit semantic information about those entities. In particular, we associate them with a fine-grained set of 136 semantic types, e.g., determine whether a given entity account belongs to a politician, or a musical artist. In the lack of explicit type information in Twitter, we obtain semantic labels for a subset of the accounts via alignment with the KBs of DBpedia and Wikidata. Given the labeled dataset, we finetune a transformer-based text encoder to generate semantic embeddings of the entities based on the contents of their accounts. We then exploit this evidence alongside network-based embeddings to predict the entities semantic types. In our experiments, we show high type prediction performance on the labeled dataset. Consequently, we apply our type classification model to all of the entity accounts in the social KB. Our analysis of the results offers insights about the global semantics of the Twitter sphere. We discuss downstream applications that should benefit from semantic type information and the semantic embeddings of social entities generated in this work. In particular, we demonstrate enhanced performance on the key task of entity similarity assessment using this information.

The X Types -- Mapping the Semantics of the Twitter Sphere

TL;DR

This work considers a social KB of roughly 200K popular Twitter accounts, which denotes entities of interest, and generates semantic embeddings of social entities generated in this work, demonstrating enhanced performance on the key task of entity similarity assessment using this information.

Abstract

Social networks form a valuable source of world knowledge, where influential entities correspond to popular accounts. Unlike factual knowledge bases (KBs), which maintain a semantic ontology, structured semantic information is not available on social media. In this work, we consider a social KB of roughly 200K popular Twitter accounts, which denotes entities of interest. We elicit semantic information about those entities. In particular, we associate them with a fine-grained set of 136 semantic types, e.g., determine whether a given entity account belongs to a politician, or a musical artist. In the lack of explicit type information in Twitter, we obtain semantic labels for a subset of the accounts via alignment with the KBs of DBpedia and Wikidata. Given the labeled dataset, we finetune a transformer-based text encoder to generate semantic embeddings of the entities based on the contents of their accounts. We then exploit this evidence alongside network-based embeddings to predict the entities semantic types. In our experiments, we show high type prediction performance on the labeled dataset. Consequently, we apply our type classification model to all of the entity accounts in the social KB. Our analysis of the results offers insights about the global semantics of the Twitter sphere. We discuss downstream applications that should benefit from semantic type information and the semantic embeddings of social entities generated in this work. In particular, we demonstrate enhanced performance on the key task of entity similarity assessment using this information.
Paper Structure (35 sections, 5 figures, 8 tables)

This paper contains 35 sections, 5 figures, 8 tables.

Figures (5)

  • Figure 1: The ratio of Twitter entity accounts which we successfully aligned with DBpedia, listed per 10K accounts bins that are ordered by descending popularity (where bin #0 includes the most popular 10K accounts). As shown, the ratio of accounts that align with entries in DBpedia correlates with the account popularity in Twitter, that is, popular Twitter entities are more likely to be covered by DBpedia.
  • Figure 2: Illustration of selected paths extracted from DBpedia's hierarchical structure of semantic types
  • Figure 3: The 20 most frequent semantic paths among the DBpedia pages that map to popular Twitter entities. Overall, these paths apply to 71.7% of all aligned Twitter entities.
  • Figure 4: A multi-step learning approach: First, we fine-tune BERT on individual tweets from our training set, attributing the account label to each tweet. The embeddings of tweets using the finetuned model are then aggregated (averaged) at user level. Similarly, social embeddings of popular entity accounts followed by the user, which were learned from a large sample of the Twitter network, are also aggregated (averaged) to form a social network-based encoding of the user. The content- and social-based embeddings may be then concatenated and fed to a neural network, trained to predict the accounts' semantic label based on this multi-facet evidence.
  • Figure 5: A visualization of the predicted distribution of semantic types among 200K popular Twitter accounts.