Table of Contents
Fetching ...

PRISM: A Methodology for Auditing Biases in Large Language Models

Leif Azzopardi, Yashar Moshfeghi

TL;DR

This paper presents PRISM, a flexible, inquiry-based methodology for auditing LLMs - that seeks to illicit such positions indirectly through task-based inquiry prompting rather than direct inquiry of said preferences.

Abstract

Auditing Large Language Models (LLMs) to discover their biases and preferences is an emerging challenge in creating Responsible Artificial Intelligence (AI). While various methods have been proposed to elicit the preferences of such models, countermeasures have been taken by LLM trainers, such that LLMs hide, obfuscate or point blank refuse to disclosure their positions on certain subjects. This paper presents PRISM, a flexible, inquiry-based methodology for auditing LLMs - that seeks to illicit such positions indirectly through task-based inquiry prompting rather than direct inquiry of said preferences. To demonstrate the utility of the methodology, we applied PRISM on the Political Compass Test, where we assessed the political leanings of twenty-one LLMs from seven providers. We show LLMs, by default, espouse positions that are economically left and socially liberal (consistent with prior work). We also show the space of positions that these models are willing to espouse - where some models are more constrained and less compliant than others - while others are more neutral and objective. In sum, PRISM can more reliably probe and audit LLMs to understand their preferences, biases and constraints.

PRISM: A Methodology for Auditing Biases in Large Language Models

TL;DR

This paper presents PRISM, a flexible, inquiry-based methodology for auditing LLMs - that seeks to illicit such positions indirectly through task-based inquiry prompting rather than direct inquiry of said preferences.

Abstract

Auditing Large Language Models (LLMs) to discover their biases and preferences is an emerging challenge in creating Responsible Artificial Intelligence (AI). While various methods have been proposed to elicit the preferences of such models, countermeasures have been taken by LLM trainers, such that LLMs hide, obfuscate or point blank refuse to disclosure their positions on certain subjects. This paper presents PRISM, a flexible, inquiry-based methodology for auditing LLMs - that seeks to illicit such positions indirectly through task-based inquiry prompting rather than direct inquiry of said preferences. To demonstrate the utility of the methodology, we applied PRISM on the Political Compass Test, where we assessed the political leanings of twenty-one LLMs from seven providers. We show LLMs, by default, espouse positions that are economically left and socially liberal (consistent with prior work). We also show the space of positions that these models are willing to espouse - where some models are more constrained and less compliant than others - while others are more neutral and objective. In sum, PRISM can more reliably probe and audit LLMs to understand their preferences, biases and constraints.

Paper Structure

This paper contains 13 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: PRISM -- A Reliable Methodology for Auditing LLMs: The LLM is prompted to write an essay for each statement (given a role or not). These essays are rated in terms of their level of agreement with the statement and then used to tally up a score to determine the LLM position given the role it has been assigned (or its default position).
  • Figure 2: Example prompting techniques that ask the LLM for its opinion or preference -- either forced selecting an option or stance or unconstrained. Figure taken from rottger2024political and updated to show that LLMs can also refuse to give an answer.
  • Figure 3: The default (no role) position of each LLM. Most LLMs, by default, espouse left and liberal-leaning positions. Mistal.AI's Mistral model was the most left and liberal, while Cohere's Command-light model was the most right and authoritarian leaning.
  • Figure 4: Windows of Political Preferences over different LLMs: GPT 4o provides the greatest capacity for espousing a wide variety of views, while LLama2 provides the least capacity. Gemini 1.0 Pro's views of economic right positions tend to be centred on the compass, while Gemini 1.5 Pro's views on authoritarian positions are barely above the y-axis.
  • Figure 5: Differences in positions when the LLM is told to assume the role of an Intelligent Agent (top-left), an Unintelligent Agent (top-right), Fair Agent (bot-left), and Unfair Agent (bot-right). "Intelligent" and "Fair" agents tend to be left and liberal of centre -- and not as extreme as the default -- the "Unfair" and "Unintelligent" agents tend to exhibit a much greater spread of positions.