Auto-US: An Ultrasound Video Diagnosis Agent Using Video Classification Framework and LLMs
Yuezhe Yang, Yiyue Guo, Wenjie Cai, Qingqing Ruan, Siying Wang, Xingbo Dong, Zhe Jin, Yong Dai
TL;DR
Auto-US addresses the challenge of ultrasound video diagnosis by introducing a multimodal agent that fuses ultrasound video classification with clinical text reasoning. The authorsConstruct the CUV Dataset by integrating public ultrasound video sources and develop CTU-Net, a three-path CNN-Transformer architecture that jointly models spatial, temporal, and frequency information to achieve state-of-the-art accuracy ($86.73\%$) on multi-disease ultrasound videos. The system further integrates Large Language Models to generate clinically meaningful diagnostic suggestions, validated through case studies and an evaluation framework that blends expert judgment with METEOR-based metrics. Together, these components demonstrate notable potential for improved diagnostic efficiency and decision support in real-world ultrasound applications, with publicly available code and data. The work highlights both the promise of multi-modal AI in ultrasound and the need for larger, more diverse datasets and richer pathology integration to reach broader clinical adoption.
Abstract
AI-assisted ultrasound video diagnosis presents new opportunities to enhance the efficiency and accuracy of medical imaging analysis. However, existing research remains limited in terms of dataset diversity, diagnostic performance, and clinical applicability. In this study, we propose \textbf{Auto-US}, an intelligent diagnosis agent that integrates ultrasound video data with clinical diagnostic text. To support this, we constructed \textbf{CUV Dataset} of 495 ultrasound videos spanning five categories and three organs, aggregated from multiple open-access sources. We developed \textbf{CTU-Net}, which achieves state-of-the-art performance in ultrasound video classification, reaching an accuracy of 86.73\% Furthermore, by incorporating large language models, Auto-US is capable of generating clinically meaningful diagnostic suggestions. The final diagnostic scores for each case exceeded 3 out of 5 and were validated by professional clinicians. These results demonstrate the effectiveness and clinical potential of Auto-US in real-world ultrasound applications. Code and data are available at: https://github.com/Bean-Young/Auto-US.
