Mapping Patient-Perceived Physician Traits from Nationwide Online Reviews with LLMs
Junjie Luo, Rui Han, Arshana Welivita, Zeleikun Di, Jingfu Wu, Xuzhe Zhi, Ritu Agarwal, Gordon Gao
TL;DR
This study introduces a scalable, transparent pipeline that uses large language models to translate millions of online physician reviews into ten interpretable traits (the Big Five plus five healthcare-specific judgments). The authors validate the approach through multi-model comparisons and human benchmarking, achieving strong alignment with expert judgments (0.72–0.89) and high external validity with patient satisfaction (up to 0.81). At national scale, the work reveals systematic patterns by gender and specialty, and identifies four physician archetypes that relate to patient-perceived quality. The findings offer a novel, bias-aware framework for measuring physician-patient relationships, with implications for quality measurement, workforce development, and potential patient-provider matching strategies, while noting limitations related to representativeness of online reviews and cross-sectional design.
Abstract
Understanding how patients perceive their physicians is essential to improving trust, communication, and satisfaction. We present a large language model (LLM)-based pipeline that infers Big Five personality traits and five patient-oriented subjective judgments. The analysis encompasses 4.1 million patient reviews of 226,999 U.S. physicians from an initial pool of one million. We validate the method through multi-model comparison and human expert benchmarking, achieving strong agreement between human and LLM assessments (correlation coefficients 0.72-0.89) and external validity through correlations with patient satisfaction (r = 0.41-0.81, all p<0.001). National-scale analysis reveals systematic patterns: male physicians receive higher ratings across all traits, with largest disparities in clinical competence perceptions; empathy-related traits predominate in pediatrics and psychiatry; and all traits positively predict overall satisfaction. Cluster analysis identifies four distinct physician archetypes, from "Well-Rounded Excellent" (33.8%, uniformly high traits) to "Underperforming" (22.6%, consistently low). These findings demonstrate that automated trait extraction from patient narratives can provide interpretable, validated metrics for understanding physician-patient relationships at scale, with implications for quality measurement, bias detection, and workforce development in healthcare.
