Table of Contents
Fetching ...

Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness

Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Tongshuang Wu

TL;DR

This work introduces Perspective-aware Information Retrieval (PIR), a benchmark that reformats six real-world datasets to test whether retrievers distinguish query perspectives beyond semantic relevance. It reveals that existing retrievers exhibit limited perspective sensitivity and notable biases, motivating a zero-shot, projection-based remedy called Perspective-aware Projection (PAP). PAP improves perspective sensitivity by projecting query and, optionally, corpus embeddings onto a perspective plane, yielding consistent gains in $p$-Recall@5 and tangible downstream benefits for tasks like AmbigQA and Perspectrum. The study also shows the feasibility of automatic perspective extraction and discusses biases in retrieval that can affect downstream fairness, underscoring the practical impact for retrieval-augmented generation and fairness-aware systems.

Abstract

The task of Information Retrieval (IR) requires a system to identify relevant documents based on users' information needs. In real-world scenarios, retrievers are expected to not only rely on the semantic relevance between the documents and the queries but also recognize the nuanced intents or perspectives behind a user query. For example, when asked to verify a claim, a retrieval system is expected to identify evidence from both supporting vs. contradicting perspectives, for the downstream system to make a fair judgment call. In this work, we study whether retrievers can recognize and respond to different perspectives of the queries -- beyond finding relevant documents for a claim, can retrievers distinguish supporting vs. opposing documents? We reform and extend six existing tasks to create a benchmark for retrieval, where we have diverse perspectives described in free-form text, besides root, neutral queries. We show that current retrievers covered in our experiments have limited awareness of subtly different perspectives in queries and can also be biased toward certain perspectives. Motivated by the observation, we further explore the potential to leverage geometric features of retriever representation space to improve the perspective awareness of retrievers in a zero-shot manner. We demonstrate the efficiency and effectiveness of our projection-based methods on the same set of tasks. Further analysis also shows how perspective awareness improves performance on various downstream tasks, with 4.2% higher accuracy on AmbigQA and 29.9% more correlation with designated viewpoints on essay writing, compared to non-perspective-aware baselines.

Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness

TL;DR

This work introduces Perspective-aware Information Retrieval (PIR), a benchmark that reformats six real-world datasets to test whether retrievers distinguish query perspectives beyond semantic relevance. It reveals that existing retrievers exhibit limited perspective sensitivity and notable biases, motivating a zero-shot, projection-based remedy called Perspective-aware Projection (PAP). PAP improves perspective sensitivity by projecting query and, optionally, corpus embeddings onto a perspective plane, yielding consistent gains in -Recall@5 and tangible downstream benefits for tasks like AmbigQA and Perspectrum. The study also shows the feasibility of automatic perspective extraction and discusses biases in retrieval that can affect downstream fairness, underscoring the practical impact for retrieval-augmented generation and fairness-aware systems.

Abstract

The task of Information Retrieval (IR) requires a system to identify relevant documents based on users' information needs. In real-world scenarios, retrievers are expected to not only rely on the semantic relevance between the documents and the queries but also recognize the nuanced intents or perspectives behind a user query. For example, when asked to verify a claim, a retrieval system is expected to identify evidence from both supporting vs. contradicting perspectives, for the downstream system to make a fair judgment call. In this work, we study whether retrievers can recognize and respond to different perspectives of the queries -- beyond finding relevant documents for a claim, can retrievers distinguish supporting vs. opposing documents? We reform and extend six existing tasks to create a benchmark for retrieval, where we have diverse perspectives described in free-form text, besides root, neutral queries. We show that current retrievers covered in our experiments have limited awareness of subtly different perspectives in queries and can also be biased toward certain perspectives. Motivated by the observation, we further explore the potential to leverage geometric features of retriever representation space to improve the perspective awareness of retrievers in a zero-shot manner. We demonstrate the efficiency and effectiveness of our projection-based methods on the same set of tasks. Further analysis also shows how perspective awareness improves performance on various downstream tasks, with 4.2% higher accuracy on AmbigQA and 29.9% more correlation with designated viewpoints on essay writing, compared to non-perspective-aware baselines.
Paper Structure (35 sections, 2 equations, 6 figures, 10 tables)

This paper contains 35 sections, 2 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: An example of how perspective-ware information retrieval differs from the current retrieval pipeline. Perspectives further specifying the intent, e.g., "Article that opposes", will influence the ranks of relevant articles, hence influencing the downstream task performance.
  • Figure 2: Retrieval performance (Recall@5) of queries with or without perspectives, macro-averaged over all the retrievers.
  • Figure 3: Expected portion of news articles from the desired or other countries in the top 5 retrieval results with SimCSE-sup on AGNews. Queries are from the location perspective, e.g., Find a news article on X topic and happen in Y, where Y is the desired country. We can observe that retrievers show imbalanced performance across countries. For example, users seeking news from Guatemala will experience a lower chance of satisfied retrieval than from Colombia. In the corpus, the numbers of articles per country are designed to be equal.
  • Figure 4: Retrieval performance (Recall@k, k=5,10) between queries with or without perspectives attached. Per-task performance is the macro-average of all the retrievers.
  • Figure 5: Accumulated numbers of news articles the top 5 retrieval results with SimCSE-sup on AGNews root queries. An example of the root query is: Find a news article that is similar to: Article X. We can observe that retrievers prefer news articles from certain countries, e.g., Brazil. In the corpus and root queries, the numbers of articles per country are designed to be equal.
  • ...and 1 more figures