Table of Contents
Fetching ...

Implicit Values Embedded in How Humans and LLMs Complete Subjective Everyday Tasks

Arjun Arunasalam, Madison Pickering, Z. Berkay Celik, Blase Ur

TL;DR

The paper addresses how implicit human values surface when AI assistants complete subjective everyday tasks and compares six popular LLMs with 100 US crowdworkers. It introduces a task-auditing framework across 30 bounded decisions to elicit seven implicit values, applying rigorous statistical analyses to reveal widespread divergence and model-to-model heterogeneity. The key contributions are (i) the first large-scale audit of implicit values in everyday tasks, (ii) a multi-LLM and human comparison showing consistent misalignment, and (iii) robustness analyses and practical discussion on personalization, elicitation, and safety guardrails for value-aligned AI assistants. The findings highlight practical implications for deploying AI assistants that reflect individual users' values rather than relying on generic model-imposed preferences, underscoring the need for user-specific value elicitation and ongoing auditing.

Abstract

Large language models (LLMs) can underpin AI assistants that help users with everyday tasks, such as by making recommendations or performing basic computation. Despite AI assistants' promise, little is known about the implicit values these assistants display while completing subjective everyday tasks. Humans may consider values like environmentalism, charity, and diversity. To what extent do LLMs exhibit these values in completing everyday tasks? How do they compare with humans? We answer these questions by auditing how six popular LLMs complete 30 everyday tasks, comparing LLMs to each other and to 100 human crowdworkers from the US. We find LLMs often do not align with humans, nor with other LLMs, in the implicit values exhibited.

Implicit Values Embedded in How Humans and LLMs Complete Subjective Everyday Tasks

TL;DR

The paper addresses how implicit human values surface when AI assistants complete subjective everyday tasks and compares six popular LLMs with 100 US crowdworkers. It introduces a task-auditing framework across 30 bounded decisions to elicit seven implicit values, applying rigorous statistical analyses to reveal widespread divergence and model-to-model heterogeneity. The key contributions are (i) the first large-scale audit of implicit values in everyday tasks, (ii) a multi-LLM and human comparison showing consistent misalignment, and (iii) robustness analyses and practical discussion on personalization, elicitation, and safety guardrails for value-aligned AI assistants. The findings highlight practical implications for deploying AI assistants that reflect individual users' values rather than relying on generic model-imposed preferences, underscoring the need for user-specific value elicitation and ongoing auditing.

Abstract

Large language models (LLMs) can underpin AI assistants that help users with everyday tasks, such as by making recommendations or performing basic computation. Despite AI assistants' promise, little is known about the implicit values these assistants display while completing subjective everyday tasks. Humans may consider values like environmentalism, charity, and diversity. To what extent do LLMs exhibit these values in completing everyday tasks? How do they compare with humans? We answer these questions by auditing how six popular LLMs complete 30 everyday tasks, comparing LLMs to each other and to 100 human crowdworkers from the US. We find LLMs often do not align with humans, nor with other LLMs, in the implicit values exhibited.

Paper Structure

This paper contains 29 sections, 18 figures, 4 tables.

Figures (18)

  • Figure 1: How LLMs and humans completed tasks pertaining to financial priorities and environmentalism.
  • Figure 2: How LLMs and humans completed tasks pertaining to privacy, as well as diversity and inclusion.
  • Figure 3: How LLMs and humans completed tasks related to potential heterogeneity.
  • Figure 4: How LLMs and humans completed multiculturalism tasks.
  • Figure 5: How LLMs and humans completed tasks related to community and religion.
  • ...and 13 more figures