Table of Contents
Fetching ...

UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models

Quan Van Nguyen, Huy Quang Pham, Dan Quang Tran, Thang Kien-Bao Nguyen, Nhat-Hao Nguyen-Dang, Bao-Thien Nguyen-Tat

TL;DR

The paper tackles automated diagnostic captioning for radiology images to assist clinicians and improve report quality. It investigates two Transformer-based pipelines—an encoder–decoder VisionDiagnostor and a BLIP2-inspired Query Transformer variant—utilizing ViT for global visuals and VinVL for object features, evaluated on the ImageCLEFmedical 2024 data. The BioBART-based VisionDiagnostor achieves the top among their submissions with a BERTScore of 0.6267, enabling DarkCow to place third overall; analyses show object features, caption length, and preprocessing influence results. The work demonstrates the feasibility of high-quality automated radiology captions and suggests promising directions, including integration with biomedical LLMs and retrieval-augmented strategies to further boost performance in clinical settings.

Abstract

Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Methods: In our participation in the ImageCLEFmedical2024 Caption evaluation campaign, we explored caption prediction tasks using advanced Transformer-based models. We developed methods incorporating Transformer encoder-decoder and Query Transformer architectures. These models were trained and evaluated to generate diagnostic captions from radiology images. Results: Experimental evaluations demonstrated the effectiveness of our models, with the VisionDiagnostor-BioBART model achieving the highest BERTScore of 0.6267. This performance contributed to our team, DarkCow, achieving third place on the leaderboard. Conclusion: Our diagnostic captioning models show great promise in aiding medical professionals by generating high-quality reports efficiently. This approach can facilitate better data processing and performance optimization in medical imaging departments, ultimately benefiting healthcare delivery.

UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models

TL;DR

The paper tackles automated diagnostic captioning for radiology images to assist clinicians and improve report quality. It investigates two Transformer-based pipelines—an encoder–decoder VisionDiagnostor and a BLIP2-inspired Query Transformer variant—utilizing ViT for global visuals and VinVL for object features, evaluated on the ImageCLEFmedical 2024 data. The BioBART-based VisionDiagnostor achieves the top among their submissions with a BERTScore of 0.6267, enabling DarkCow to place third overall; analyses show object features, caption length, and preprocessing influence results. The work demonstrates the feasibility of high-quality automated radiology captions and suggests promising directions, including integration with biomedical LLMs and retrieval-augmented strategies to further boost performance in clinical settings.

Abstract

Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Methods: In our participation in the ImageCLEFmedical2024 Caption evaluation campaign, we explored caption prediction tasks using advanced Transformer-based models. We developed methods incorporating Transformer encoder-decoder and Query Transformer architectures. These models were trained and evaluated to generate diagnostic captions from radiology images. Results: Experimental evaluations demonstrated the effectiveness of our models, with the VisionDiagnostor-BioBART model achieving the highest BERTScore of 0.6267. This performance contributed to our team, DarkCow, achieving third place on the leaderboard. Conclusion: Our diagnostic captioning models show great promise in aiding medical professionals by generating high-quality reports efficiently. This approach can facilitate better data processing and performance optimization in medical imaging departments, ultimately benefiting healthcare delivery.
Paper Structure (26 sections, 11 equations, 8 figures, 7 tables)

This paper contains 26 sections, 11 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Several images from the ImageCLEFmedical2024 dataset
  • Figure 2: Distribution of caption lengths in the training set
  • Figure 3: Distribution of caption lengths in the valid set
  • Figure 4: Application of Gaussian filter.
  • Figure 5: The image after a series of processing.
  • ...and 3 more figures