Leveraging Social Determinants of Health in Alzheimer's Research Using LLM-Augmented Literature Mining and Knowledge Graphs
Tianqi Shang, Shu Yang, Weiqing He, Tianhua Zhai, Dawei Li, Bojian Hou, Tianlong Chen, Jason H. Moore, Marylyn D. Ritchie, Li Shen
TL;DR
The paper addresses how social determinants of health (SDoH) influence Alzheimer's disease (AD) risk and progression, a link hampered by data scarcity and heterogeneous sources. It introduces an automated pipeline that uses large language models (LLMs), specifically GPT-4o, and NLP to extract SDoH concepts from PubMed abstracts, then constructs an SDoH-augmented knowledge graph by integrating these triplets with a PrimeKG biomedical subgraph. The authors evaluate the utility of the augmented graph through link prediction with graph convolutional networks (GCNs), reporting improvements over a SDoH-free baseline across seven relations and several AD-relevant genes, with several improvements reaching statistical significance (e.g., for genes such as TREM2, BIN1, CR1, SPI1, INPP5D, ABI3, C7). They also demonstrate exploratory predictions, predicting novel gene–SDoH and gene–gene edges, and validate a subset by PubMed co-occurrence, indicating the framework’s potential to reveal actionable insights and generalize to other SDoH-related research. The approach provides a scalable, generalizable tool to enhance knowledge discovery in AD and beyond, leveraging literature-derived SDoH for mechanistic understanding and potential interventions.
Abstract
Growing evidence suggests that social determinants of health (SDoH), a set of nonmedical factors, affect individuals' risks of developing Alzheimer's disease (AD) and related dementias. Nevertheless, the etiological mechanisms underlying such relationships remain largely unclear, mainly due to difficulties in collecting relevant information. This study presents a novel, automated framework that leverages recent advancements of large language model (LLM) and natural language processing techniques to mine SDoH knowledge from extensive literature and integrate it with AD-related biological entities extracted from the general-purpose knowledge graph PrimeKG. Utilizing graph neural networks, we performed link prediction tasks to evaluate the resultant SDoH-augmented knowledge graph. Our framework shows promise for enhancing knowledge discovery in AD and can be generalized to other SDoH-related research areas, offering a new tool for exploring the impact of social determinants on health outcomes. Our code is available at: https://github.com/hwq0726/SDoHenPKG
