Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia

Farhan Samir; Chan Young Park; Anjalie Field; Vered Shwartz; Yulia Tsvetkov

Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia

Farhan Samir, Chan Young Park, Anjalie Field, Vered Shwartz, Yulia Tsvetkov

TL;DR

The InfoGap method is introduced—an efficient and reliable approach to locating information gaps and inconsistencies in articles at the fact level, across languages, and pinpoints local document- and fact-level information gaps, laying a new foundation for targeted and nuanced comparative language analysis at scale.

Abstract

To explain social phenomena and identify systematic biases, much research in computational social science focuses on comparative text analyses. These studies often rely on coarse corpus-level statistics or local word-level analyses, mainly in English. We introduce the InfoGap method -- an efficient and reliable approach to locating information gaps and inconsistencies in articles at the fact level, across languages. We evaluate InfoGap by analyzing LGBT people's portrayals, across 2.7K biography pages on English, Russian, and French Wikipedias. We find large discrepancies in factual coverage across the languages. Moreover, our analysis reveals that biographical facts carrying negative connotations are more likely to be highlighted in Russian Wikipedia. Crucially, InfoGap both facilitates large scale analyses, and pinpoints local document- and fact-level information gaps, laying a new foundation for targeted and nuanced comparative language analysis at scale.

Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia

TL;DR

Abstract

Paper Structure (48 sections, 2 theorems, 3 equations, 3 figures, 9 tables)

This paper contains 48 sections, 2 theorems, 3 equations, 3 figures, 9 tables.

Introduction
InfoGap: Identifying Information Asymmetry in Wikipedia Articles
X-FactAlign: Cross-Lingual Fact Alignment
Fact Decomposition.
Fact Representation.
Paragraph Alignment.
Correcting for Hubness.
X-FactMatch: Cross-Lingual Fact Matching
Assessing the Reliability of InfoGap
Using InfoGap to Analyze Asymmetries in LGBT Wikipedia Bios
Implementation Details
RQ$_1$: Information Gaps in Bios
RQ$_2$: Effect of LGBT Affiliation on Information Gaps
Features.
Regression Model.
...and 33 more sections

Key Result

Proposition 1

The probability of InfoGap making $k$ errors is $\leq \exp(-2(1-\epsilon)^2k)$, where $\epsilon$ is the error rate of the classifier when it predicts $F\not \vDash e_i$.

Figures (3)

Figure 1: We propose a method, InfoGap, to locate fact (mis)alignments in Wikipedia biographies in different language versions. InfoGap identifies facts that are common to a pair of articles ("Griner was born on October 18, 1990"), and facts unique to one language version ("Griner had recorded the sixth triple-double"; En only) enabling further analysis of information gaps, editors' selective preferences within articles, and analyses at scale across languages, cultures, and demographics.
Figure 2: Schematic of the InfoGap procedure. We describe the Fact Decomposition and Multilingual Alignment steps in §\ref{['sec:x-fact-retrieve']}, and the Alignment Verification step in §\ref{['sec:x-fact-eq']}.
Figure 3: Distribution of information overlaps for LGBTBioCorpus. Top: Distribution over the percentage of facts in En biographies also found in their Fr and Ru counterparts. Bottom: Distribution over the percentage of facts in Fr and Ru biographies also found in their English counterparts. $N=2,700$ biographies. In general, En biographies contain more facts that are exclusive to En.

Theorems & Definitions (3)

Proposition 1: Error Bound of Event Identification through InfoGap
proof
Theorem 1: Hoeffding's inequality

Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia

TL;DR

Abstract

Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (3)