Learning Personalized Decision Support Policies

Umang Bhatt; Valerie Chen; Katherine M. Collins; Parameswaran Kamalaruban; Emma Kallina; Adrian Weller; Ameet Talwalkar

Learning Personalized Decision Support Policies

Umang Bhatt, Valerie Chen, Katherine M. Collins, Parameswaran Kamalaruban, Emma Kallina, Adrian Weller, Ameet Talwalkar

TL;DR

The paper addresses the problem of personalizing AI-enabled decision support to improve decision outcomes for unseen decision-makers. It introduces Modiste, an interactive tool that treats decision support as a stochastic contextual bandit problem and learns per-user policies using LinUCB or KNN-UCB, enabling online customization of when and what form of support to present. Through computational simulations and real-user studies on CIFAR-3A and MMLU-2A, it demonstrates that personalization yields gains for decision-makers with varying expertise, reduces policy variance, and can defer to human judgment when appropriate. The work also formalizes the decision-support policy problem, provides an open-source implementation, and discusses regulatory and ethical considerations for practical deployment of personalized AI-assisted decision-making.

Abstract

Individual human decision-makers may benefit from different forms of support to improve decision outcomes, but when each form of support will yield better outcomes? In this work, we posit that personalizing access to decision support tools can be an effective mechanism for instantiating the appropriate use of AI assistance. Specifically, we propose the general problem of learning a decision support policy that, for a given input, chooses which form of support to provide to decision-makers for whom we initially have no prior information. We develop $\texttt{Modiste}$, an interactive tool to learn personalized decision support policies. $\texttt{Modiste}$ leverages stochastic contextual bandit techniques to personalize a decision support policy for each decision-maker and supports extensions to the multi-objective setting to account for auxiliary objectives like the cost of support. We find that personalized policies outperform offline policies, and, in the cost-aware setting, reduce the incurred cost with minimal degradation to performance. Our experiments include various realistic forms of support (e.g., expert consensus and predictions from a large language model) on vision and language tasks. Our human subject experiments validate our computational experiments, demonstrating that personalization can yield benefits in practice for real users, who interact with $\texttt{Modiste}$.

Learning Personalized Decision Support Policies

TL;DR

Abstract

, an interactive tool to learn personalized decision support policies.

leverages stochastic contextual bandit techniques to personalize a decision support policy for each decision-maker and supports extensions to the multi-objective setting to account for auxiliary objectives like the cost of support. We find that personalized policies outperform offline policies, and, in the cost-aware setting, reduce the incurred cost with minimal degradation to performance. Our experiments include various realistic forms of support (e.g., expert consensus and predictions from a large language model) on vision and language tasks. Our human subject experiments validate our computational experiments, demonstrating that personalization can yield benefits in practice for real users, who interact with

Paper Structure (56 sections, 5 equations, 15 figures, 5 tables, 1 algorithm)

This paper contains 56 sections, 5 equations, 15 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Regulating AI Use.
Decision Support.
Prior Assumptions About Decision-Maker Information.
Preliminaries
General Problem Formulation.
Decision-Making Protocol.
Evaluation of $\pi$ via Expected Loss.
Modiste: Learning Personalized Decision Support Policies
Learning Problem
Modiste Interface
Expertise Profiles
Human-informed synthetic decision-makers.
Policies for each profile.
...and 41 more sections

Figures (15)

Figure 1: Depending on the input, decision-makers need different forms of decision support to make correct decisions. Modiste personalizes access to the right form of support at the right time for the right decision-maker online. Here, Alice would not benefit from model access, while Bob would not benefit from a senior consult.
Figure 2: We illustrate the process of learning a decision support policy $\pi_t$ online to improve a decision-maker $h$'s performance. Since assuming access to sufficient amounts of offline data is unreasonable in practice, our formulation learns a personalized policy online; each decision-maker's learned policy may differ from that of another decision-maker if they have decisions ($\tilde{y}$) and thus different expertise.
Figure 3: Example of the Modiste interface for MMLU-$2A$ where the human is provided responses from a LLM.
Figure 4: We report expected average loss $L_h(\pi)$ (lower is better) and standard error in the last 10 trials by Prolific participants for each algorithm, with CIFAR conditions on the left and MMLU conditions on the right. In the CIFAR setting, where individuals typically exhibit "varying" expertise profiles, we see significant benefits from using Modiste, particularly in the KNN setting. While we observe that most individuals in the MMLU condition exhibit "strictly better" expertise, which means personalized policies typically only perform as well as the best baseline, we still observe instances of deferred decisions to the human on a case-by-case basis---see Figure \ref{['fig:hse-mmlu-topology']}.
Figure 5: Snapshots of the learned decision support policies computed at the end of the study for 10 participants on the MMLU task. The forms of support are colored in t-SNE embedding space. All participants exhibit distinct policies across input space. The bar plot to the right of each scatter plot shows the relative performance of that decision-maker alone in each category, ordered from left to right as M=Mathematics, B=Biology, CS=Computer Science, FP=Foreign Policy per subplot. When a decision-maker performs well alone, Modiste learns policies to empower that decision-maker without LLM access. For example, the individual in the top left is highly competent at both Mathematics and Foreign Policy; the learned decision support policy reflects this.
...and 10 more figures

Learning Personalized Decision Support Policies

TL;DR

Abstract

Learning Personalized Decision Support Policies

Authors

TL;DR

Abstract

Table of Contents

Figures (15)