Table of Contents
Fetching ...

Can Fairness Be Prompted? Prompt-Based Debiasing Strategies in High-Stakes Recommendations

Mihaela Rotar, Theresia Veronika Rampisela, Maria Maistro

Abstract

Large Language Models (LLMs) can infer sensitive attributes such as gender or age from indirect cues like names and pronouns, potentially biasing recommendations. While several debiasing methods exist, they require access to the LLMs' weights, are computationally costly, and cannot be used by lay users. To address this gap, we investigate implicit biases in LLM Recommenders (LLMRecs) and explore whether prompt-based strategies can serve as a lightweight and easy-to-use debiasing approach. We contribute three bias-aware prompting strategies for LLMRecs. To our knowledge, this is the first study on prompt-based debiasing approaches in LLMRecs that focuses on group fairness for users. Our experiments with 3 LLMs, 4 prompt templates, 9 sensitive attribute values, and 2 datasets show that our proposed debiasing approach, which instructs an LLM to be fair, can improve fairness by up to 74% while retaining comparable effectiveness, but might overpromote specific demographic groups in some cases.

Can Fairness Be Prompted? Prompt-Based Debiasing Strategies in High-Stakes Recommendations

Abstract

Large Language Models (LLMs) can infer sensitive attributes such as gender or age from indirect cues like names and pronouns, potentially biasing recommendations. While several debiasing methods exist, they require access to the LLMs' weights, are computationally costly, and cannot be used by lay users. To address this gap, we investigate implicit biases in LLM Recommenders (LLMRecs) and explore whether prompt-based strategies can serve as a lightweight and easy-to-use debiasing approach. We contribute three bias-aware prompting strategies for LLMRecs. To our knowledge, this is the first study on prompt-based debiasing approaches in LLMRecs that focuses on group fairness for users. Our experiments with 3 LLMs, 4 prompt templates, 9 sensitive attribute values, and 2 datasets show that our proposed debiasing approach, which instructs an LLM to be fair, can improve fairness by up to 74% while retaining comparable effectiveness, but might overpromote specific demographic groups in some cases.
Paper Structure (4 sections, 3 figures, 2 tables)

This paper contains 4 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Our contributions with actual examples from our news recommendation experiments. More similar responses from neutral and sensitive prompt variants mean less biased recommendations. We find that in some cases, bias-aware prompts could give over-adjusted responses based on implicit sensitive attributes (e.g., pronoun-inferred gender).
  • Figure 2: Neutral prompts: baseline (a) and bias-aware (b)--(d). Sensitive prompts are obtained by replacing 'this user' with pronoun/social roles. Text repeated from (a) is grayed. rec_type is replaced by [jobs/news]. history lists 10 interacted items.
  • Figure 3: Recommendation similarity of neutral vs. sensitive variants, with Jaccard (top) and BERTScore (bottom) for the fairest LLMs.