Leveraging Large Language Models for Analyzing Blood Pressure Variations Across Biological Sex from Scientific Literature
Yuting Guo, Seyedeh Somayyeh Mousavi, Reza Sameni, Abeed Sarker
TL;DR
This work addresses biases in blood pressure measurement by leveraging a large language model to extract sex-specific BP statistics from the biomedical literature. Using zero-shot prompting with GPT-3.5-turbo, the authors retrieve mean and standard deviation of SBP and DBP for males and females from 993 PubMed abstracts (out of ~25 million) and analyze distributional differences via heatmaps and Gaussian mixture models. The study demonstrates the feasibility of large-scale, literature-based BP analysis and finds that males generally exhibit higher BP than females, while also highlighting limitations such as potential hallucinations and reliance on abstracts. This approach offers a scalable pathway to assemble heterogeneous BP datasets from published studies, potentially informing clinical benchmarks and demographic considerations in BP assessment.
Abstract
Hypertension, defined as blood pressure (BP) that is above normal, holds paramount significance in the realm of public health, as it serves as a critical precursor to various cardiovascular diseases (CVDs) and significantly contributes to elevated mortality rates worldwide. However, many existing BP measurement technologies and standards might be biased because they do not consider clinical outcomes, comorbidities, or demographic factors, making them inconclusive for diagnostic purposes. There is limited data-driven research focused on studying the variance in BP measurements across these variables. In this work, we employed GPT-35-turbo, a large language model (LLM), to automatically extract the mean and standard deviation values of BP for both males and females from a dataset comprising 25 million abstracts sourced from PubMed. 993 article abstracts met our predefined inclusion criteria (i.e., presence of references to blood pressure, units of blood pressure such as mmHg, and mention of biological sex). Based on the automatically-extracted information from these articles, we conducted an analysis of the variations of BP values across biological sex. Our results showed the viability of utilizing LLMs to study the BP variations across different demographic factors.
