Table of Contents
Fetching ...

EqualizeIR: Mitigating Linguistic Biases in Retrieval Models

Jiali Cheng, Hadi Amiri

TL;DR

The paper tackles linguistic biases in neural IR models that cause performance gaps across queries with different linguistic complexity. It proposes EqualizeIR, a two-stage framework that first trains a linguistically biased weak learner and then regularizes a robust model by fusing biased signals with robust predictions via $\log(z_D) = \sigma(\alpha \log(z_B) + \log(z_R))$, where $\alpha \in [0,1]$. Key contributions include quantifying linguistic complexity with 45 metrics, introducing four strategies to produce biased weak learners, and demonstrating improved average retrieval performance with reduced bias on BEIR benchmarks (implemented on DPR as a case study). This approach offers a practical path toward fairer and more reliable IR across diverse linguistic styles, with broad applicability to dense retrieval settings and beyond.

Abstract

This study finds that existing information retrieval (IR) models show significant biases based on the linguistic complexity of input queries, performing well on linguistically simpler (or more complex) queries while underperforming on linguistically more complex (or simpler) queries. To address this issue, we propose EqualizeIR, a framework to mitigate linguistic biases in IR models. EqualizeIR uses a linguistically biased weak learner to capture linguistic biases in IR datasets and then trains a robust model by regularizing and refining its predictions using the biased weak learner. This approach effectively prevents the robust model from overfitting to specific linguistic patterns in data. We propose four approaches for developing linguistically-biased models. Extensive experiments on several datasets show that our method reduces performance disparities across linguistically simple and complex queries, while improving overall retrieval performance.

EqualizeIR: Mitigating Linguistic Biases in Retrieval Models

TL;DR

The paper tackles linguistic biases in neural IR models that cause performance gaps across queries with different linguistic complexity. It proposes EqualizeIR, a two-stage framework that first trains a linguistically biased weak learner and then regularizes a robust model by fusing biased signals with robust predictions via , where . Key contributions include quantifying linguistic complexity with 45 metrics, introducing four strategies to produce biased weak learners, and demonstrating improved average retrieval performance with reduced bias on BEIR benchmarks (implemented on DPR as a case study). This approach offers a practical path toward fairer and more reliable IR across diverse linguistic styles, with broad applicability to dense retrieval settings and beyond.

Abstract

This study finds that existing information retrieval (IR) models show significant biases based on the linguistic complexity of input queries, performing well on linguistically simpler (or more complex) queries while underperforming on linguistically more complex (or simpler) queries. To address this issue, we propose EqualizeIR, a framework to mitigate linguistic biases in IR models. EqualizeIR uses a linguistically biased weak learner to capture linguistic biases in IR datasets and then trains a robust model by regularizing and refining its predictions using the biased weak learner. This approach effectively prevents the robust model from overfitting to specific linguistic patterns in data. We propose four approaches for developing linguistically-biased models. Extensive experiments on several datasets show that our method reduces performance disparities across linguistically simple and complex queries, while improving overall retrieval performance.

Paper Structure

This paper contains 23 sections, 3 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: NDCG@10 of BM25 on the test set of NFCorpus nfcorpus (left) decreases and on the test set of FIQA fiqa (right) increases as the average linguistic complexity lu2010automaticlu2012relationship of queries increase. Specifically, we observe a significant drop in NDCG@10, from 0.4 to 0, and a significant increase in NDCG@10, from 0.2 to 0.3. The result shows that BM25 is significantly biased toward linguistically easy and hard examples on different datasets.
  • Figure 2: Architecture of EqualizeIR for mitigating linguistic biases in IR models. (a) Training process: first, a linguistically biased IR model $f_B$ is trained. Then, we freeze the parameters of $f_B$ to train a target, linguistically robust IR model $f_R$ by taking the product of logits of $f_B$ and $f_R$. The biased weak learner regularizes the ranking loss of $f_R$ using its learned linguistic biases. (b): Examples showing that the ensemble approach effectively moderates prediction probabilities to avoid learning biases associated with high confidence or moving too heavily toward the biased weak learner. (c): Strategies for developing linguistically biased weak learners.
  • Figure 3: NDCG@10 of EqualizeIR and DPR karpukhin2020dense as linguistic complexity of queries increase. Detailed performance of all baselines is shown in Figure \ref{['fig:ndcg']} in Appendix \ref{['sec:app']}.
  • Figure 4: Performance in NDCG@10 as linguistic complexity of queries increase.
  • Figure 5: Performance of $f_B$ obtained by four different strategies, which are highly linguistically biased.