Computational Phenomenology of Borderline Personality Disorder: A Comparative Evaluation of LLM-Simulated Expert Personas and Human Clinical Experts
Marcin Moskalewicz, Anna Sterna, Karolina Drożdż, Kacper Dudzic, Marek Pokropski, Paula Flores
TL;DR
This work investigates whether large language models can reproduce and augment human phenomenological analysis of lived experiences in Borderline Personality Disorder. Using a three-study mixed-method design, GPT-4o, Gemini Pro, and Claude Opus are evaluated against human experts across semantic congruence, content validity, and perceived authorship, complemented by computational embeddings and public judgments. Results show variable but notable semantic overlap, with Gemini often aligning closely to human analyses and Claude sometimes achieving higher perceived semantic match in public ratings; AI also identifies themes human analysts may overlook. The findings suggest AI-augmented thematic analysis can mitigate interpretive bias and enhance sensitivity, though robust human scrutiny remains essential due to variability and stylistic differences across models. Overall, AI can safely support qualitative analysis by handling data scale and pattern-seeking while human interpreters maintain final judgment and phenomenological validity.
Abstract
Building on a human-led thematic analysis of life-story interviews with inpatients with Borderline Personality Disorder, this study examines the capacity of large language models (OpenAI's GPT, Google's Gemini, and Anthropic's Claude) to support qualitative clinical analysis. The models were evaluated through a mixed procedure. Study A involved blinded and non-blinded expert judges in phenomenology and clinical psychology. Assessments included semantic congruence, Jaccard coefficients for overlap of outputs, multidimensional validity ratings of credibility, coherence, and the substantiveness of results, and their grounding in qualitative data. In Study B, neural methods were used to embed the theme descriptions created by humans and the models in a two-dimensional vector space to provide a computational measure of the difference between human and model semantics and linguistic style. In Study C, complementary non-expert evaluations were conducted to examine the influence of thematic verbosity on the perception of human authorship and content validity. Results of all three studies revealed variable overlap with the human analysis, with models being partly indistinguishable from, and also identifying themes originally omitted by, human researchers. The findings highlight both the variability and potential of AI-augmented thematic qualitative analysis to mitigate human interpretative bias and enhance sensitivity.
