Measuring and Addressing Indexical Bias in Information Retrieval

Caleb Ziems; William Held; Jane Dwivedi-Yu; Diyi Yang

Measuring and Addressing Indexical Bias in Information Retrieval

Caleb Ziems, William Held, Jane Dwivedi-Yu, Diyi Yang

TL;DR

The paper tackles indexical bias in information retrieval by introducing the PAIR framework and the unsupervised Duo bias metric, which assesses how ranking order may skew user perspectives. Duo relies on a polarization axis learned from the Wiki-Balance corpora via PCA and computes a balance score as $ ext{Duo}(r) = 1 - ext{nDCG}(r, u_V)$ with $u_V(i,r) = \frac{1}{i} \sum_{j=1}^{i} (p_j - \bar{p})^2$, enabling automatic bias audits without labeled data. The authors validate Duo on synthetic and natural data, demonstrate a strong correlation with supervised bias metrics, and show that Duo reliably predicts SEME in a behavioral study when users click through biased results. They audit a range of IR systems, revealing trade-offs between relevance and indexical bias and highlighting domain-specific weaknesses, particularly in politics and environment. The work lays a foundation for automatic, scalable bias measurement and potential reranking strategies to mitigate indexical bias in IR and related rank-ordered information systems.

Abstract

Information Retrieval (IR) systems are designed to deliver relevant content, but traditional systems may not optimize rankings for fairness, neutrality, or the balance of ideas. Consequently, IR can often introduce indexical biases, or biases in the positional order of documents. Although indexical bias can demonstrably affect people's opinion, voting patterns, and other behaviors, these issues remain understudied as the field lacks reliable metrics and procedures for automatically measuring indexical bias. Towards this end, we introduce the PAIR framework, which supports automatic bias audits for ranked documents or entire IR systems. After introducing DUO, the first general-purpose automatic bias metric, we run an extensive evaluation of 8 IR systems on a new corpus of 32k synthetic and 4.7k natural documents, with 4k queries spanning 1.4k controversial issue topics. A human behavioral study validates our approach, showing that our bias metric can help predict when and how indexical bias will shift a reader's opinion.

Measuring and Addressing Indexical Bias in Information Retrieval

TL;DR

with

, enabling automatic bias audits without labeled data. The authors validate Duo on synthetic and natural data, demonstrate a strong correlation with supervised bias metrics, and show that Duo reliably predicts SEME in a behavioral study when users click through biased results. They audit a range of IR systems, revealing trade-offs between relevance and indexical bias and highlighting domain-specific weaknesses, particularly in politics and environment. The work lays a foundation for automatic, scalable bias measurement and potential reranking strategies to mitigate indexical bias in IR and related rank-ordered information systems.

Abstract

Paper Structure (40 sections, 15 equations, 5 figures, 9 tables)

This paper contains 40 sections, 15 equations, 5 figures, 9 tables.

Introduction
Related Work
Classifying Stance, Leaning, and Ideology.
Diversifying and Debiasing IR.
Auditing Bias in IR.
Foundational Bias Corpora
Source.
Wiki-Balance$_\text{Synthetic}$
Wiki-Balance$_\text{Natural}$
Wiki-Balance Quality Estimation
Human Evaluation.
Safety.
Indexical Bias Metrics
Prior Metrics
The Duo Bias Metric
...and 25 more sections

Figures (5)

Figure 1: The Search Engine Manipulation Effect as predicted by our Duo metric over a set of documents favoring Capitalism or Socialism. If users read a pro-Capitalism list, they will be more likely to adopt a Capitalist position, and our metric reflects this. The ranking with a most balanced order (right) gets the minimal score of Duo=0, whereas the documents with the greatest possible indexical bias (left) get the greatest score of Duo=1. Duo uses a discounted sum of variances $\sigma^2_{1:i}$ in polarization across documents' embeddings. On the left, the first 3 Capitalist articles have zero variance in polarity, so $\sigma^2_{1:3}=0$. The full list has a variance of $\sigma^2_{1:6}=1$, but since this balance appears far down the ranking, $\sigma^2_{1:6}$ is highly discounted.
Figure 2: Wiki-Balance Pipeline. First we pull 1,364 controversial topics from English Wikipedia. Data is under CC BY-SA License and is consistent with intended use. Then we generate 10 debate questions about each topic (examples are abbreviated in this figure). For each debate question, we generate 8 polarized documents, with 4 on each side of the initial axis generated by the LLM.
Figure 3: Human Behavioral Study Interface to help determine whether biased search results lead to the SEME. Participants read a query (Q) with a given Perspective (Persp) and tell us whether they agree (3) or disagree (-3) with Perspective. After reading a manipulated list of up to 10 search results, they summarize their informed opinion and provide us their updated agreement on a scale from -3 to 3. We expect more biased results to more radically shift their opinions.
Figure 4: Wikipedia Controversial Topic Distribution can reflect biases in the Wikipedia editor pool. This explains why localized political topics are typically from English-speaking (71.69%) countries, and why there is over-representation of American issues.
Figure 5: Distributions of Polarization Scores in the Synthetic (top) and Natural (bottom) corpus. Synthetic data is bimodal and polarized, while Natural data is normal and thus more neutral.

Measuring and Addressing Indexical Bias in Information Retrieval

TL;DR

Abstract

Measuring and Addressing Indexical Bias in Information Retrieval

Authors

TL;DR

Abstract

Table of Contents

Figures (5)