ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs
Zihao Zhao, Sheng Wang, Jinchen Gu, Yitao Zhu, Lanzhuju Mei, Zixu Zhuang, Zhiming Cui, Qian Wang, Dinggang Shen
TL;DR
This work tackles the need for universal and reliable CAD-assisted medical dialogue by fusing cross-domain image interpretation with LLM-based reporting and a knowledge-backed interaction module. The proposed architecture comprises Reliable Report Generation (universal interpretation via domain identification and hierarchical in-context learning) and Reliable Interaction (LLM-guided knowledge retrieval from Merck Manuals) to deliver clinically reliable reports and advice. Experiments demonstrate robust domain identification with BiomedCLIP, competitive report generation across diverse LLMs, and improved clinical question answering through external knowledge retrieval. The results suggest strong potential for cross-domain CAD systems in clinical workflows, while acknowledging limitations such as reliance on external databases and the latency of knowledge retrieval.
Abstract
The integration of Computer-Aided Diagnosis (CAD) with Large Language Models (LLMs) presents a promising frontier in clinical applications, notably in automating diagnostic processes akin to those performed by radiologists and providing consultations similar to a virtual family doctor. Despite the promising potential of this integration, current works face at least two limitations: (1) From the perspective of a radiologist, existing studies typically have a restricted scope of applicable imaging domains, failing to meet the diagnostic needs of different patients. Also, the insufficient diagnostic capability of LLMs further undermine the quality and reliability of the generated medical reports. (2) Current LLMs lack the requisite depth in medical expertise, rendering them less effective as virtual family doctors due to the potential unreliability of the advice provided during patient consultations. To address these limitations, we introduce ChatCAD+, to be universal and reliable. Specifically, it is featured by two main modules: (1) Reliable Report Generation and (2) Reliable Interaction. The Reliable Report Generation module is capable of interpreting medical images from diverse domains and generate high-quality medical reports via our proposed hierarchical in-context learning. Concurrently, the interaction module leverages up-to-date information from reputable medical websites to provide reliable medical advice. Together, these designed modules synergize to closely align with the expertise of human medical professionals, offering enhanced consistency and reliability for interpretation and advice. The source code is available at https://github.com/zhaozh10/ChatCAD.
