Table of Contents
Fetching ...

Mitigating the Threshold Priming Effect in Large Language Model-Based Relevance Judgments via Personality Infusing

Nuo Chen, Hanpei Fang, Jiqun Liu, Wilson Wei, Tetsuya Sakai, Xiao-Ming Wu

TL;DR

The paper addresses threshold priming in LLM-based relevance judgments and proposes simulating Big Five personality traits to mitigate bias via personality prompting. It introduces a two-step approach: generate trait-based simulation prompts and conduct batched relevance assessments with epilogue/prologue context, measuring priming via the mean delta Δ. Experiments across TRDL21/22 datasets and three LLMs reveal that traits like High Openness and Low Neuroticism reduce priming, but optimal traits are model- and task-dependent. The work provides a practical method to improve reliability of LLM evaluators and opens a pathway for integrating psychological theory into bias mitigation in AI.

Abstract

Recent research has explored LLMs as scalable tools for relevance labeling, but studies indicate they are susceptible to priming effects, where prior relevance judgments influence later ones. Although psychological theories link personality traits to such biases, it is unclear whether simulated personalities in LLMs exhibit similar effects. We investigate how Big Five personality profiles in LLMs influence priming in relevance labeling, using multiple LLMs on TREC 2021 and 2022 Deep Learning Track datasets. Our results show that certain profiles, such as High Openness and Low Neuroticism, consistently reduce priming susceptibility. Additionally, the most effective personality in mitigating priming may vary across models and task types. Based on these findings, we propose personality prompting as a method to mitigate threshold priming, connecting psychological evidence with LLM-based evaluation practices.

Mitigating the Threshold Priming Effect in Large Language Model-Based Relevance Judgments via Personality Infusing

TL;DR

The paper addresses threshold priming in LLM-based relevance judgments and proposes simulating Big Five personality traits to mitigate bias via personality prompting. It introduces a two-step approach: generate trait-based simulation prompts and conduct batched relevance assessments with epilogue/prologue context, measuring priming via the mean delta Δ. Experiments across TRDL21/22 datasets and three LLMs reveal that traits like High Openness and Low Neuroticism reduce priming, but optimal traits are model- and task-dependent. The work provides a practical method to improve reliability of LLM evaluators and opens a pathway for integrating psychological theory into bias mitigation in AI.

Abstract

Recent research has explored LLMs as scalable tools for relevance labeling, but studies indicate they are susceptible to priming effects, where prior relevance judgments influence later ones. Although psychological theories link personality traits to such biases, it is unclear whether simulated personalities in LLMs exhibit similar effects. We investigate how Big Five personality profiles in LLMs influence priming in relevance labeling, using multiple LLMs on TREC 2021 and 2022 Deep Learning Track datasets. Our results show that certain profiles, such as High Openness and Low Neuroticism, consistently reduce priming susceptibility. Additionally, the most effective personality in mitigating priming may vary across models and task types. Based on these findings, we propose personality prompting as a method to mitigate threshold priming, connecting psychological evidence with LLM-based evaluation practices.

Paper Structure

This paper contains 21 sections, 2 equations, 1 figure, 2 tables.

Figures (1)

  • Figure 1: An example of the methodology adopted in our experiment.