QuST-LLM: Integrating Large Language Models for Comprehensive Spatial Transcriptomics Analysis
Chao Hui Huang
TL;DR
QuST-LLM addresses the interpretability challenge of spatial transcriptomics by translating high dimensional spatial gene expression into human readable biological narratives. It extends QuPath via QuST to provide end to end ST data loading, ROI selection, GO enrichment analysis, and LLM driven interpretation. The framework supports forward analysis based on key genes and comparative expression, as well as backward analysis that maps natural language descriptions to spatial regions, validated by examples using GPT-4 and GOATOOLS. Quantitative validation includes ROC AUC performance (e.g., 0.94) demonstrating strong alignment between language prompts and spatial patterns, underscoring the tool's potential to enhance interpretability and accessibility in spatial biology.
Abstract
In this paper, we introduce QuST-LLM, an innovative extension of QuPath that utilizes the capabilities of large language models (LLMs) to analyze and interpret spatial transcriptomics (ST) data. In addition to simplifying the intricate and high-dimensional nature of ST data by offering a comprehensive workflow that includes data loading, region selection, gene expression analysis, and functional annotation, QuST-LLM employs LLMs to transform complex ST data into understandable and detailed biological narratives based on gene ontology annotations, thereby significantly improving the interpretability of ST data. Consequently, users can interact with their own ST data using natural language. Hence, QuST-LLM provides researchers with a potent functionality to unravel the spatial and functional complexities of tissues, fostering novel insights and advancements in biomedical research. QuST-LLM is a part of QuST project. The source code is hosted on GitHub and documentation is available at (https://github.com/huangch/qust).
