Table of Contents
Fetching ...

ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs

Zihao Zhao, Sheng Wang, Jinchen Gu, Yitao Zhu, Lanzhuju Mei, Zixu Zhuang, Zhiming Cui, Qian Wang, Dinggang Shen

TL;DR

This work tackles the need for universal and reliable CAD-assisted medical dialogue by fusing cross-domain image interpretation with LLM-based reporting and a knowledge-backed interaction module. The proposed architecture comprises Reliable Report Generation (universal interpretation via domain identification and hierarchical in-context learning) and Reliable Interaction (LLM-guided knowledge retrieval from Merck Manuals) to deliver clinically reliable reports and advice. Experiments demonstrate robust domain identification with BiomedCLIP, competitive report generation across diverse LLMs, and improved clinical question answering through external knowledge retrieval. The results suggest strong potential for cross-domain CAD systems in clinical workflows, while acknowledging limitations such as reliance on external databases and the latency of knowledge retrieval.

Abstract

The integration of Computer-Aided Diagnosis (CAD) with Large Language Models (LLMs) presents a promising frontier in clinical applications, notably in automating diagnostic processes akin to those performed by radiologists and providing consultations similar to a virtual family doctor. Despite the promising potential of this integration, current works face at least two limitations: (1) From the perspective of a radiologist, existing studies typically have a restricted scope of applicable imaging domains, failing to meet the diagnostic needs of different patients. Also, the insufficient diagnostic capability of LLMs further undermine the quality and reliability of the generated medical reports. (2) Current LLMs lack the requisite depth in medical expertise, rendering them less effective as virtual family doctors due to the potential unreliability of the advice provided during patient consultations. To address these limitations, we introduce ChatCAD+, to be universal and reliable. Specifically, it is featured by two main modules: (1) Reliable Report Generation and (2) Reliable Interaction. The Reliable Report Generation module is capable of interpreting medical images from diverse domains and generate high-quality medical reports via our proposed hierarchical in-context learning. Concurrently, the interaction module leverages up-to-date information from reputable medical websites to provide reliable medical advice. Together, these designed modules synergize to closely align with the expertise of human medical professionals, offering enhanced consistency and reliability for interpretation and advice. The source code is available at https://github.com/zhaozh10/ChatCAD.

ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs

TL;DR

This work tackles the need for universal and reliable CAD-assisted medical dialogue by fusing cross-domain image interpretation with LLM-based reporting and a knowledge-backed interaction module. The proposed architecture comprises Reliable Report Generation (universal interpretation via domain identification and hierarchical in-context learning) and Reliable Interaction (LLM-guided knowledge retrieval from Merck Manuals) to deliver clinically reliable reports and advice. Experiments demonstrate robust domain identification with BiomedCLIP, competitive report generation across diverse LLMs, and improved clinical question answering through external knowledge retrieval. The results suggest strong potential for cross-domain CAD systems in clinical workflows, while acknowledging limitations such as reliance on external databases and the latency of knowledge retrieval.

Abstract

The integration of Computer-Aided Diagnosis (CAD) with Large Language Models (LLMs) presents a promising frontier in clinical applications, notably in automating diagnostic processes akin to those performed by radiologists and providing consultations similar to a virtual family doctor. Despite the promising potential of this integration, current works face at least two limitations: (1) From the perspective of a radiologist, existing studies typically have a restricted scope of applicable imaging domains, failing to meet the diagnostic needs of different patients. Also, the insufficient diagnostic capability of LLMs further undermine the quality and reliability of the generated medical reports. (2) Current LLMs lack the requisite depth in medical expertise, rendering them less effective as virtual family doctors due to the potential unreliability of the advice provided during patient consultations. To address these limitations, we introduce ChatCAD+, to be universal and reliable. Specifically, it is featured by two main modules: (1) Reliable Report Generation and (2) Reliable Interaction. The Reliable Report Generation module is capable of interpreting medical images from diverse domains and generate high-quality medical reports via our proposed hierarchical in-context learning. Concurrently, the interaction module leverages up-to-date information from reputable medical websites to provide reliable medical advice. Together, these designed modules synergize to closely align with the expertise of human medical professionals, offering enhanced consistency and reliability for interpretation and advice. The source code is available at https://github.com/zhaozh10/ChatCAD.
Paper Structure (22 sections, 4 equations, 9 figures, 8 tables)

This paper contains 22 sections, 4 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Overview of our proposed ChatCAD+ system. (a) For patients seeking a diagnosis, ChatCAD+ generates reliable medical reports based on the input medical image(s) by referring to local report database. (b) Additionally, for any inquiry from patients, ChatCAD+ retrieves related knowledge from online database and lets large language model generate reliable response.
  • Figure 2: Overview of the reliable report generation. (a) Universal interpretation: To enhance the vision capability of LLMs, multiple CAD models are incorporated. The numerical results obtained from these models are transformed into visual descriptions following the rule of prob2text. (b) Hierarchical in-context learning: The LLM initially generates a preliminary report based on the visual description, which is then enhanced through in-context learning using retrieved semantically similar reports. LLM and CLIP models are kept frozen, while CAD models are denoted as trainable as they can be continually trained and updated without interfering with other components.
  • Figure 3: The illustration of the retrieval module within reliable report generation. It adopts the TF-IDF algorithm to preserve the semantics of each report and converts it into a latent embedding during offline modeling and online inference. To facilitate highly efficient retrieval, we perform spherical projection on all TF-IDF embeddings, whether during building or querying. In this manner, we can utilize the KD-Tree structure to store these data and implement retrieval with a low time complexity.
  • Figure 4: Overview of the reliable interaction. (a) The illustration of structured medical knowledge database, organized as a tree-like dictionary, where each medical topic has multiple sections while sections can be further divided into subsections. (b) A LLM-based knowledge retrieval method is proposed to search relevant knowledge in a backtrack manner. (c) The LLM is prompted to answer the question based on the retrieved knowledge.
  • Figure 5: Evaluation of domain identification using different CLIPs.
  • ...and 4 more figures