Table of Contents
Fetching ...

Large Language Model Aided Birt-Hogg-Dube Syndrome Diagnosis with Multimodal Retrieval-Augmented Generation

Haoqing Li, Jun Shi, Xianmeng Chen, Qiwei Jia, Rui Wang, Wei Wei, Hong An, Xiaowen Hu

TL;DR

This work tackles the difficulty of diagnosing Birt-Hogg-Dubé syndrome (BHD) from CT when data are scarce and cystic lung diseases share similar imaging features. It introduces BHD-RAG, a multimodal retrieval-augmented generation framework that builds a DCLD-corpus, trains a cosine-space retriever with angular-margin learning, and uses retrieval-augmented generation to fuse evidence with imaging data in an LLM to produce diagnosis. Key contributions include (i) constructing a domain-specific corpus with expert refinement of image-description pairs, (ii) a cosine-space retriever that emphasizes discriminative features among DCLDs, and (iii) a retrieval-augmented generator that yields evidence-based, clinically aligned diagnoses. Evaluations on four DCLDs show that BHD-RAG substantially improves diagnostic accuracy and offers explainable imaging descriptions, indicating potential to reduce hallucinations in medical MLLMs; future work will expand multi-center data to develop a multimodal foundation model for DCLDs.

Abstract

Deep learning methods face dual challenges of limited clinical samples and low inter-class differentiation among Diffuse Cystic Lung Diseases (DCLDs) in advancing Birt-Hogg-Dube syndrome (BHD) diagnosis via Computed Tomography (CT) imaging. While Multimodal Large Language Models (MLLMs) demonstrate diagnostic potential fo such rare diseases, the absence of domain-specific knowledge and referable radiological features intensify hallucination risks. To address this problem, we propose BHD-RAG, a multimodal retrieval-augmented generation framework that integrates DCLD-specific expertise and clinical precedents with MLLMs to improve BHD diagnostic accuracy. BHDRAG employs: (1) a specialized agent generating imaging manifestation descriptions of CT images to construct a multimodal corpus of DCLDs cases. (2) a cosine similarity-based retriever pinpointing relevant imagedescription pairs for query images, and (3) an MLLM synthesizing retrieved evidence with imaging data for diagnosis. BHD-RAG is validated on the dataset involving four types of DCLDs, achieving superior accuracy and generating evidence-based descriptions closely aligned with expert insights.

Large Language Model Aided Birt-Hogg-Dube Syndrome Diagnosis with Multimodal Retrieval-Augmented Generation

TL;DR

This work tackles the difficulty of diagnosing Birt-Hogg-Dubé syndrome (BHD) from CT when data are scarce and cystic lung diseases share similar imaging features. It introduces BHD-RAG, a multimodal retrieval-augmented generation framework that builds a DCLD-corpus, trains a cosine-space retriever with angular-margin learning, and uses retrieval-augmented generation to fuse evidence with imaging data in an LLM to produce diagnosis. Key contributions include (i) constructing a domain-specific corpus with expert refinement of image-description pairs, (ii) a cosine-space retriever that emphasizes discriminative features among DCLDs, and (iii) a retrieval-augmented generator that yields evidence-based, clinically aligned diagnoses. Evaluations on four DCLDs show that BHD-RAG substantially improves diagnostic accuracy and offers explainable imaging descriptions, indicating potential to reduce hallucinations in medical MLLMs; future work will expand multi-center data to develop a multimodal foundation model for DCLDs.

Abstract

Deep learning methods face dual challenges of limited clinical samples and low inter-class differentiation among Diffuse Cystic Lung Diseases (DCLDs) in advancing Birt-Hogg-Dube syndrome (BHD) diagnosis via Computed Tomography (CT) imaging. While Multimodal Large Language Models (MLLMs) demonstrate diagnostic potential fo such rare diseases, the absence of domain-specific knowledge and referable radiological features intensify hallucination risks. To address this problem, we propose BHD-RAG, a multimodal retrieval-augmented generation framework that integrates DCLD-specific expertise and clinical precedents with MLLMs to improve BHD diagnostic accuracy. BHDRAG employs: (1) a specialized agent generating imaging manifestation descriptions of CT images to construct a multimodal corpus of DCLDs cases. (2) a cosine similarity-based retriever pinpointing relevant imagedescription pairs for query images, and (3) an MLLM synthesizing retrieved evidence with imaging data for diagnosis. BHD-RAG is validated on the dataset involving four types of DCLDs, achieving superior accuracy and generating evidence-based descriptions closely aligned with expert insights.

Paper Structure

This paper contains 10 sections, 5 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Representative CT imaging examples of four challenging-to-differentiate DCLDs: (a) BHD, characterized by cysts of variable size, elliptical or flattened, predominantly located in the lower lungs and subpleural regions; (b) LAM, featuring round and of similar size cysts; (c) PLCH, predominantly located in the lower lungs and subpleural regions; (d) LIP, the cysts vary in size, predominantly in the bilateral lung bases, and follow a perivascular pattern.
  • Figure 2: An overview of the proposed BHD-RAG framework.
  • Figure 3: The proposed cosine spatial similarity measure retriever.
  • Figure 4: The proposed cosine spatial similarity measure retriever. (a) Influence of $k$ value on BHD-RAG. (b) Quantitative comparison between BHD-RAG and other methods.
  • Figure 5: Qualitative comparison of BHD-RAG and GPT-4-turbo diagnoses. Errors or imprecisions in results are highlighted in red by experts.