Table of Contents
Fetching ...

Mapping Patient-Perceived Physician Traits from Nationwide Online Reviews with LLMs

Junjie Luo, Rui Han, Arshana Welivita, Zeleikun Di, Jingfu Wu, Xuzhe Zhi, Ritu Agarwal, Gordon Gao

TL;DR

This study introduces a scalable, transparent pipeline that uses large language models to translate millions of online physician reviews into ten interpretable traits (the Big Five plus five healthcare-specific judgments). The authors validate the approach through multi-model comparisons and human benchmarking, achieving strong alignment with expert judgments (0.72–0.89) and high external validity with patient satisfaction (up to 0.81). At national scale, the work reveals systematic patterns by gender and specialty, and identifies four physician archetypes that relate to patient-perceived quality. The findings offer a novel, bias-aware framework for measuring physician-patient relationships, with implications for quality measurement, workforce development, and potential patient-provider matching strategies, while noting limitations related to representativeness of online reviews and cross-sectional design.

Abstract

Understanding how patients perceive their physicians is essential to improving trust, communication, and satisfaction. We present a large language model (LLM)-based pipeline that infers Big Five personality traits and five patient-oriented subjective judgments. The analysis encompasses 4.1 million patient reviews of 226,999 U.S. physicians from an initial pool of one million. We validate the method through multi-model comparison and human expert benchmarking, achieving strong agreement between human and LLM assessments (correlation coefficients 0.72-0.89) and external validity through correlations with patient satisfaction (r = 0.41-0.81, all p<0.001). National-scale analysis reveals systematic patterns: male physicians receive higher ratings across all traits, with largest disparities in clinical competence perceptions; empathy-related traits predominate in pediatrics and psychiatry; and all traits positively predict overall satisfaction. Cluster analysis identifies four distinct physician archetypes, from "Well-Rounded Excellent" (33.8%, uniformly high traits) to "Underperforming" (22.6%, consistently low). These findings demonstrate that automated trait extraction from patient narratives can provide interpretable, validated metrics for understanding physician-patient relationships at scale, with implications for quality measurement, bias detection, and workforce development in healthcare.

Mapping Patient-Perceived Physician Traits from Nationwide Online Reviews with LLMs

TL;DR

This study introduces a scalable, transparent pipeline that uses large language models to translate millions of online physician reviews into ten interpretable traits (the Big Five plus five healthcare-specific judgments). The authors validate the approach through multi-model comparisons and human benchmarking, achieving strong alignment with expert judgments (0.72–0.89) and high external validity with patient satisfaction (up to 0.81). At national scale, the work reveals systematic patterns by gender and specialty, and identifies four physician archetypes that relate to patient-perceived quality. The findings offer a novel, bias-aware framework for measuring physician-patient relationships, with implications for quality measurement, workforce development, and potential patient-provider matching strategies, while noting limitations related to representativeness of online reviews and cross-sectional design.

Abstract

Understanding how patients perceive their physicians is essential to improving trust, communication, and satisfaction. We present a large language model (LLM)-based pipeline that infers Big Five personality traits and five patient-oriented subjective judgments. The analysis encompasses 4.1 million patient reviews of 226,999 U.S. physicians from an initial pool of one million. We validate the method through multi-model comparison and human expert benchmarking, achieving strong agreement between human and LLM assessments (correlation coefficients 0.72-0.89) and external validity through correlations with patient satisfaction (r = 0.41-0.81, all p<0.001). National-scale analysis reveals systematic patterns: male physicians receive higher ratings across all traits, with largest disparities in clinical competence perceptions; empathy-related traits predominate in pediatrics and psychiatry; and all traits positively predict overall satisfaction. Cluster analysis identifies four distinct physician archetypes, from "Well-Rounded Excellent" (33.8%, uniformly high traits) to "Underperforming" (22.6%, consistently low). These findings demonstrate that automated trait extraction from patient narratives can provide interpretable, validated metrics for understanding physician-patient relationships at scale, with implications for quality measurement, bias detection, and workforce development in healthcare.

Paper Structure

This paper contains 18 sections, 7 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Distribution of personality trait scores. Each subplot shows the kernel density estimate for one of the 10 traits measured on a 0-1 scale.
  • Figure 2: Correlation matrix of the 10 physician traits. Darker colors indicate stronger correlations. Notable patterns include high correlations between interpersonal traits (IQC-SPS: r=0.928) and between agreeableness and patient-oriented measures.
  • Figure 3: Quality validation of trait extraction methodology. (a) Sufficiency scores show U-shaped patterns where extreme trait scores (0 or 1) demonstrate higher evidence adequacy compared to moderate scores. (b) Consistency patterns reveal that physicians with clearly defined traits (high or low scores) generate more convergent patient assessments, while moderate scores reflect appropriate uncertainty when traits are ambiguous. Data points represent mean values with standard error of the mean (SEM) calculated across all traits and physicians (n = 226,999). Trend lines show polynomial regression fits with 95% confidence bands.
  • Figure 4: Gender differences in physician trait scores. Male physicians receive higher ratings across all 10 traits, with largest differences in patient-oriented subjective judgments (PCC, SCO, STS) rather than personality traits. Error bars represent 95% confidence intervals calculated from standard errors. Statistical significance was determined using Welch's t-test for unequal variances. Sample sizes vary by trait (n$_{\text{male}}$ = 85,714--125,293; n$_{\text{female}}$ = 42,443--61,472; total n = 128,157--186,669).
  • Figure 5: Specialty differences in physician traits. (a) BigFive personality trait scores by medical specialty show distinct patterns, with surgical specialties generally demonstrating higher conscientiousness and emotional stability. (b) Professional competency measures reveal specialty-specific strengths, with surgical specialties excelling in perceived clinical competence while patient-oriented specialties show higher interpersonal qualities.
  • ...and 3 more figures