Table of Contents
Fetching ...

The Average Patient Fallacy

Alaleh Azhir, Shawn N. Murphy, Hossein Estiri

TL;DR

Problem: Machine learning in medicine is driven by population averages, risking neglect of rare but high-impact cases. Approach: formalizes the average patient fallacy, contrasts standard risk minimization with precision-oriented objectives, and introduces clinically weighted objectives, contextual optimization, and measurable rare-case metrics. Key contributions: definitions of Rare Case Performance Gap (RCPG), Rare-Case Calibration Error (RCCE), the Rarity Index, and a lambda-governed constrained optimization framework grounded in clinical consensus. Significance: provides an auditable pathway to align AI with precision medicine, improving detection and treatment of rare presentations while maintaining ethical and practical safeguards.

Abstract

Machine learning in medicine is typically optimized for population averages. This frequency weighted training privileges common presentations and marginalizes rare yet clinically critical cases, a bias we call the average patient fallacy. In mixture models, gradients from rare cases are suppressed by prevalence, creating a direct conflict with precision medicine. Clinical vignettes in oncology, cardiology, and ophthalmology show how this yields missed rare responders, delayed recognition of atypical emergencies, and underperformance on vision-threatening variants. We propose operational fixes: Rare Case Performance Gap, Rare Case Calibration Error, a prevalence utility definition of rarity, and clinically weighted objectives that surface ethical priorities. Weight selection should follow structured deliberation. AI in medicine must detect exceptional cases because of their significance.

The Average Patient Fallacy

TL;DR

Problem: Machine learning in medicine is driven by population averages, risking neglect of rare but high-impact cases. Approach: formalizes the average patient fallacy, contrasts standard risk minimization with precision-oriented objectives, and introduces clinically weighted objectives, contextual optimization, and measurable rare-case metrics. Key contributions: definitions of Rare Case Performance Gap (RCPG), Rare-Case Calibration Error (RCCE), the Rarity Index, and a lambda-governed constrained optimization framework grounded in clinical consensus. Significance: provides an auditable pathway to align AI with precision medicine, improving detection and treatment of rare presentations while maintaining ethical and practical safeguards.

Abstract

Machine learning in medicine is typically optimized for population averages. This frequency weighted training privileges common presentations and marginalizes rare yet clinically critical cases, a bias we call the average patient fallacy. In mixture models, gradients from rare cases are suppressed by prevalence, creating a direct conflict with precision medicine. Clinical vignettes in oncology, cardiology, and ophthalmology show how this yields missed rare responders, delayed recognition of atypical emergencies, and underperformance on vision-threatening variants. We propose operational fixes: Rare Case Performance Gap, Rare Case Calibration Error, a prevalence utility definition of rarity, and clinically weighted objectives that surface ethical priorities. Weight selection should follow structured deliberation. AI in medicine must detect exceptional cases because of their significance.

Paper Structure

This paper contains 13 sections, 15 equations, 2 figures.

Figures (2)

  • Figure 1: Mixture distribution of patient phenotypes (Equation \ref{['eq:mixture']}), showing the dominant common phenotype (blue, weight $1 - \pi = 0.9$) and rare phenotype (red, $\pi = 0.1$ for visibility; in practice, $\pi \ll 0.01$). The low weight of rare cases causes their underrepresentation in model optimization (Equation \ref{['eq:gradient']}) despite their high mutual information (Equation \ref{['eq:mutual_info']}).
  • Figure 2: Feature space showing common and rare clusters (Equations \ref{['eq:objective']} and \ref{['eq:convergence']}). Optimization is prejudiced towards the common cluster, marginalizing rare cases. The distance between clusters represents not merely feature-space separation but potential differences in optimal treatment strategy: common diabetic retinopathy may respond to laser photocoagulation, while rare retinal vasculitis requires immunosuppression. The AI may achieve high overall accuracy (e.g., 87.2% sensitivity for more-than-mild DR) but remains incompetent on rare variants, as suggested by subgroup analyses abramoff2020pivotalgrzybowski2020ai.