On the Pros and Cons of Active Learning for Moral Preference Elicitation

Vijay Keswani; Vincent Conitzer; Hoda Heidari; Jana Schaich Borg; Walter Sinnott-Armstrong

On the Pros and Cons of Active Learning for Moral Preference Elicitation

Vijay Keswani, Vincent Conitzer, Hoda Heidari, Jana Schaich Borg, Walter Sinnott-Armstrong

TL;DR

This work questions the effectiveness of active learning for eliciting moral preferences by formalizing preferences with a utility function $u(x)$ and observing responses via $R(x,x')$, then evaluating two active-learning approaches (version-space and Bayesian BALD) against random querying under moral-specific challenges. Using simulations that introduce preference instability, model misspecification, and noise, the study shows that active learning can outperform random querying in small, stable, and well-specified regimes, but can underperform when instability or misspecification are pronounced or when feature dimensionality is high. The results stress the need for context-aware deployment and robust, adaptable elicitation strategies for moral judgments, especially in high-stakes domains like organ allocation or autonomous systems. The findings motivate cautious use of active learning and call for human-subject studies to quantify how often the moral-preference assumptions are violated in practice and how to design more resilient elicitation methods.

Abstract

Computational preference elicitation methods are tools used to learn people's preferences quantitatively in a given context. Recent works on preference elicitation advocate for active learning as an efficient method to iteratively construct queries (framed as comparisons between context-specific cases) that are likely to be most informative about an agent's underlying preferences. In this work, we argue that the use of active learning for moral preference elicitation relies on certain assumptions about the underlying moral preferences, which can be violated in practice. Specifically, we highlight the following common assumptions (a) preferences are stable over time and not sensitive to the sequence of presented queries, (b) the appropriate hypothesis class is chosen to model moral preferences, and (c) noise in the agent's responses is limited. While these assumptions can be appropriate for preference elicitation in certain domains, prior research on moral psychology suggests they may not be valid for moral judgments. Through a synthetic simulation of preferences that violate the above assumptions, we observe that active learning can have similar or worse performance than a basic random query selection method in certain settings. Yet, simulation results also demonstrate that active learning can still be viable if the degree of instability or noise is relatively small and when the agent's preferences can be approximately represented with the hypothesis class used for learning. Our study highlights the nuances associated with effective moral preference elicitation in practice and advocates for the cautious use of active learning as a methodology to learn moral preferences.

On the Pros and Cons of Active Learning for Moral Preference Elicitation

TL;DR

This work questions the effectiveness of active learning for eliciting moral preferences by formalizing preferences with a utility function

and observing responses via

, then evaluating two active-learning approaches (version-space and Bayesian BALD) against random querying under moral-specific challenges. Using simulations that introduce preference instability, model misspecification, and noise, the study shows that active learning can outperform random querying in small, stable, and well-specified regimes, but can underperform when instability or misspecification are pronounced or when feature dimensionality is high. The results stress the need for context-aware deployment and robust, adaptable elicitation strategies for moral judgments, especially in high-stakes domains like organ allocation or autonomous systems. The findings motivate cautious use of active learning and call for human-subject studies to quantify how often the moral-preference assumptions are violated in practice and how to design more resilient elicitation methods.

Abstract

Paper Structure (43 sections, 4 equations, 10 figures, 1 algorithm)

This paper contains 43 sections, 4 equations, 10 figures, 1 algorithm.

Introduction
Our Contributions
Preference instability.
Model misspecification.
Noisy responses.
Related Work
Algorithms for Preference Elicitation
Version-Space-based Active Learning.
Bayesian Active Learning.
Challenges to Modeling Moral Preferences
Preference instability.
Model misspecification.
Noisy responses.
Testing the Efficacy of Active Learning
Simulation setup.
...and 28 more sections

Figures (10)

Figure 1: Example of a pairwise comparison from the boerstler2024instability study on kidney allocation decisions.
Figure 2: Performance for preference-change scenarios from Section \ref{['sec:pref_change']}. Active-Bayes-PE often performs better than Random-PE post-$t_{\text{change}}$ when $d{=}5$. However, in many cases (e.g., $d{=}10$, $t_{\text{change}}{=}20, 30$), both active learning algorithms have similar or worse performance than Random-PE.
Figure 3: Performance for model misspecfication scenarios from Section \ref{['sec:model_misspecify']}. Active learning is more effective when the extent of model misspecification is small in scale.
Figure 4: Performance for the noise models from Section \ref{['sec:noise_analysis']}. Active-BAYES-PE performs better than the random query baseline even with response noise. However, it fails to provide a similar improvement in most scenarios of preference noise.
Figure 5: Performance of Active-VS-PE, Active-Bayes-PE and Random-PE in an "idealized setting" (i.e., no assumption violations).
...and 5 more figures

On the Pros and Cons of Active Learning for Moral Preference Elicitation

TL;DR

Abstract

On the Pros and Cons of Active Learning for Moral Preference Elicitation

Authors

TL;DR

Abstract

Table of Contents

Figures (10)