Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance Choices

Mallika Mainali; Harsha Sureshbabu; Anik Sen; Christopher B. Rauch; Noah D. Reifsnyder; John Meyer; J. T. Turner; Michael W. Floyd; Matthew Molineaux; Rosina O. Weber

Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance Choices

Mallika Mainali, Harsha Sureshbabu, Anik Sen, Christopher B. Rauch, Noah D. Reifsnyder, John Meyer, J. T. Turner, Michael W. Floyd, Matthew Molineaux, Rosina O. Weber

TL;DR

Problem: aligning algorithmic decision-makers with human decision-making attributes in health-insurance choices under uncertainty. Approach: reimplements a classical AI DMA based on Molineaux et al. and an LLM-based DMA following Hu et al., using zero-shot prompts with weighted self-consistency and evaluating on a health-insurance benchmark with target risk levels $0.0$, $0.5$, and $1.0$. Contributions: a direct, controlled comparison under identical data, showing comparable alignment across classical AI and LLM-based DMs, with GPT-5 achieving the highest overall accuracy; code and dataset are released. Significance: demonstrates that context-specific DMA is feasible across domains, exposes trade-offs between structural stability and contextual adaptability, and guides future development of cognitively grounded, domain-aware algorithmic DMs.

Abstract

As algorithmic decision-makers are increasingly applied to high-stakes domains, AI alignment research has evolved from a focus on universal value alignment to context-specific approaches that account for decision-maker attributes. Prior work on Decision-Maker Alignment (DMA) has explored two primary strategies: (1) classical AI methods integrating case-based reasoning, Bayesian reasoning, and naturalistic decision-making, and (2) large language model (LLM)-based methods leveraging prompt engineering. While both approaches have shown promise in limited domains such as medical triage, their generalizability to novel contexts remains underexplored. In this work, we implement a prior classical AI model and develop an LLM-based algorithmic decision-maker evaluated using a large reasoning model (GPT-5) and a non-reasoning model (GPT-4) with weighted self-consistency under a zero-shot prompting framework, as proposed in recent literature. We evaluate both approaches on a health insurance decision-making dataset annotated for three target decision-makers with varying levels of risk tolerance (0.0, 0.5, 1.0). In the experiments reported herein, classical AI and LLM-based models achieved comparable alignment with attribute-based targets, with classical AI exhibiting slightly better alignment for a moderate risk profile. The dataset and open-source implementation are publicly available at: https://github.com/TeX-Base/ClassicalAIvsLLMsforDMAlignment and https://github.com/Parallax-Advanced-Research/ITM/tree/feature_insurance.

Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance Choices

TL;DR

Abstract

Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance Choices

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)