Table of Contents
Fetching ...

Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance Choices

Mallika Mainali, Harsha Sureshbabu, Anik Sen, Christopher B. Rauch, Noah D. Reifsnyder, John Meyer, J. T. Turner, Michael W. Floyd, Matthew Molineaux, Rosina O. Weber

TL;DR

Problem: aligning algorithmic decision-makers with human decision-making attributes in health-insurance choices under uncertainty. Approach: reimplements a classical AI DMA based on Molineaux et al. and an LLM-based DMA following Hu et al., using zero-shot prompts with weighted self-consistency and evaluating on a health-insurance benchmark with target risk levels $0.0$, $0.5$, and $1.0$. Contributions: a direct, controlled comparison under identical data, showing comparable alignment across classical AI and LLM-based DMs, with GPT-5 achieving the highest overall accuracy; code and dataset are released. Significance: demonstrates that context-specific DMA is feasible across domains, exposes trade-offs between structural stability and contextual adaptability, and guides future development of cognitively grounded, domain-aware algorithmic DMs.

Abstract

As algorithmic decision-makers are increasingly applied to high-stakes domains, AI alignment research has evolved from a focus on universal value alignment to context-specific approaches that account for decision-maker attributes. Prior work on Decision-Maker Alignment (DMA) has explored two primary strategies: (1) classical AI methods integrating case-based reasoning, Bayesian reasoning, and naturalistic decision-making, and (2) large language model (LLM)-based methods leveraging prompt engineering. While both approaches have shown promise in limited domains such as medical triage, their generalizability to novel contexts remains underexplored. In this work, we implement a prior classical AI model and develop an LLM-based algorithmic decision-maker evaluated using a large reasoning model (GPT-5) and a non-reasoning model (GPT-4) with weighted self-consistency under a zero-shot prompting framework, as proposed in recent literature. We evaluate both approaches on a health insurance decision-making dataset annotated for three target decision-makers with varying levels of risk tolerance (0.0, 0.5, 1.0). In the experiments reported herein, classical AI and LLM-based models achieved comparable alignment with attribute-based targets, with classical AI exhibiting slightly better alignment for a moderate risk profile. The dataset and open-source implementation are publicly available at: https://github.com/TeX-Base/ClassicalAIvsLLMsforDMAlignment and https://github.com/Parallax-Advanced-Research/ITM/tree/feature_insurance.

Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance Choices

TL;DR

Problem: aligning algorithmic decision-makers with human decision-making attributes in health-insurance choices under uncertainty. Approach: reimplements a classical AI DMA based on Molineaux et al. and an LLM-based DMA following Hu et al., using zero-shot prompts with weighted self-consistency and evaluating on a health-insurance benchmark with target risk levels , , and . Contributions: a direct, controlled comparison under identical data, showing comparable alignment across classical AI and LLM-based DMs, with GPT-5 achieving the highest overall accuracy; code and dataset are released. Significance: demonstrates that context-specific DMA is feasible across domains, exposes trade-offs between structural stability and contextual adaptability, and guides future development of cognitively grounded, domain-aware algorithmic DMs.

Abstract

As algorithmic decision-makers are increasingly applied to high-stakes domains, AI alignment research has evolved from a focus on universal value alignment to context-specific approaches that account for decision-maker attributes. Prior work on Decision-Maker Alignment (DMA) has explored two primary strategies: (1) classical AI methods integrating case-based reasoning, Bayesian reasoning, and naturalistic decision-making, and (2) large language model (LLM)-based methods leveraging prompt engineering. While both approaches have shown promise in limited domains such as medical triage, their generalizability to novel contexts remains underexplored. In this work, we implement a prior classical AI model and develop an LLM-based algorithmic decision-maker evaluated using a large reasoning model (GPT-5) and a non-reasoning model (GPT-4) with weighted self-consistency under a zero-shot prompting framework, as proposed in recent literature. We evaluate both approaches on a health insurance decision-making dataset annotated for three target decision-makers with varying levels of risk tolerance (0.0, 0.5, 1.0). In the experiments reported herein, classical AI and LLM-based models achieved comparable alignment with attribute-based targets, with classical AI exhibiting slightly better alignment for a moderate risk profile. The dataset and open-source implementation are publicly available at: https://github.com/TeX-Base/ClassicalAIvsLLMsforDMAlignment and https://github.com/Parallax-Advanced-Research/ITM/tree/feature_insurance.

Paper Structure

This paper contains 16 sections, 1 equation, 3 figures, 1 table, 2 algorithms.

Figures (3)

  • Figure 1: Schematic overview of the dataset structure with example probes, contextual attributes of the decision-maker, a target decision maker attribute, risk tolerance, and four available choices. The ground truth indicates the most aligned option.
  • Figure 2: Implementation of classical AI and LLM-based algorithmic DMs, where both models receive a scenario probe and output a final decision. The final decision must be aligned with the decision made by a target decision-maker of same risk level. The classical AI algorithmic DM relies on prior cases, while the LLM-based DM uses zero-shot prompting and self-consistency sampling for aligned decision-making.
  • Figure 3: Performance of the three models across three targets with varying risk tolerances (Alex: $0$, highly risk-averse; Brie: $0.5$, moderately risk-averse; Chad: $1.0$, risk-tolerant). Bars indicate individual target alignment accuracy, and the legend denotes the model.