Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain

Brian Hu; Bill Ray; Alice Leung; Amy Summerville; David Joy; Christopher Funk; Arslan Basharat

Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain

Brian Hu, Bill Ray, Alice Leung, Amy Summerville, David Joy, Christopher Funk, Arslan Basharat

TL;DR

The paper addresses decision-making in medical triage where expert opinions may conflict and no single right answer exists. It introduces a DMA-labeled medical triage dataset and a zero-shot prompting framework, augmented by weighted self-consistency, to align LLMs to different decision-maker attributes. Across open-source models, alignment improves with model size and training techniques (e.g., RLHF), with Llama2-13B-Chat plus self-consistency achieving strong performance. The work provides an extensible open-source framework to study human-aligned decision-making in high-stakes settings and suggests future directions for modeling pluralistic human values in AI systems.

Abstract

In difficult decision-making scenarios, it is common to have conflicting opinions among expert human decision-makers as there may not be a single right answer. Such decisions may be guided by different attributes that can be used to characterize an individual's decision. We introduce a novel dataset for medical triage decision-making, labeled with a set of decision-maker attributes (DMAs). This dataset consists of 62 scenarios, covering six different DMAs, including ethical principles such as fairness and moral desert. We present a novel software framework for human-aligned decision-making by utilizing these DMAs, paving the way for trustworthy AI with better guardrails. Specifically, we demonstrate how large language models (LLMs) can serve as ethical decision-makers, and how their decisions can be aligned to different DMAs using zero-shot prompting. Our experiments focus on different open-source models with varying sizes and training techniques, such as Falcon, Mistral, and Llama 2. Finally, we also introduce a new form of weighted self-consistency that improves the overall quantified performance. Our results provide new research directions in the use of LLMs as alignable decision-makers. The dataset and open-source software are publicly available at: https://github.com/ITM-Kitware/llm-alignable-dm.

Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain

TL;DR

Abstract

Paper Structure (37 sections, 2 equations, 11 figures, 3 tables)

This paper contains 37 sections, 2 equations, 11 figures, 3 tables.

Introduction
Related Work
Question-answering Benchmarks
LLM Reasoning and Prompt Engineering
LLM Alignment Approaches
Medical Triage Alignment Dataset
Approach
LLMs as Unaligned Decision-Makers
Alignment to Decision-Maker Attributes
Model Self-Consistency and Explainability
Evaluation Metric
Experiments
Unaligned vs. Aligned Model Results
Effect of Model Size
Effect of Model Training
...and 22 more sections

Figures (11)

Figure 1: An example scenario from our dataset, which consists of the context, a question, and labeled decision choices corresponding to high or low levels of a decision-maker attribute (risk aversion shown here). The AI decision-maker must choose the correct choice when aligned to a target attribute value. The scenarios in our dataset are designed to test one attribute at a time, although some scenario choices are labeled with multiple attributes.
Figure 2: Our approach for aligning LLMs to different DMAs. A scenario is presented to the model to produce an unaligned decision, which provides a measure of the model's implicit decision-making tendencies. To align the model to a particular DMA (e.g. fairness shown here), we use a zero-shot alignment prompt as well as a form of weighted self-consistency. Weighted self-consistency samples the model multiple times using both high and low attribute prompts, and then majority weights the chosen answers based on the target attribute value (e.g. positive weight for high fairness answers and negative weight for low fairness answers when aligning to high fairness). Self-consistency also produces reasoning traces that are used as a form of explanation.
Figure 3: Alignment accuracy reported for each attribute, with high (green) and low (red) target values shown for each on the opposite ends. Starting with 0% at the center, each concentric circle marks a 20% increment in the accuracy approaching 100%, the ideal value. (a) shows unaligned model performance, which provides a measure of the implicit decision-making tendencies of each model. (b) shows the proposed aligned + self-consistency model performance across different base models (Llama2, Falcon, and Mistral). The polygons with larger areas generally suggest better performance: (b) shows significantly improved alignment accuracy over (a); and (b) shows Llama2-13B-Chat and Mistral-7B-Instruct as the two most competitive models, consistent with Tab. \ref{['tab:results']}.
Figure 4: Comparison of Falcon-7B-Instruct's alignment accuracy, both high and low, across three configurations: unaligned, aligned, and aligned with self-consistency, in relation to various attributes.
Figure 5: Comparison of Mistral-7B-Instruct's alignment accuracy, both high and low, across three configurations: unaligned, aligned, and aligned with self-consistency, in relation to various attributes.
...and 6 more figures

Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain

TL;DR

Abstract

Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain

Authors

TL;DR

Abstract

Table of Contents

Figures (11)