Table of Contents
Fetching ...

YOLO-MED : Multi-Task Interaction Network for Biomedical Images

Suizhi Huang, Shalayiding Sirejiding, Yuxiang Lu, Yue Ding, Leheng Liu, Hui Zhou, Hongtao Lu

TL;DR

This study proposes an efficient end-to-end multi-task network capable of concurrently performing object detection and semantic segmentation called YOLO-Med, which employs a backbone and a neck for multi-scale feature extraction and a cross-scale task-interaction module is employed in order to facilitate information fusion between various tasks.

Abstract

Object detection and semantic segmentation are pivotal components in biomedical image analysis. Current single-task networks exhibit promising outcomes in both detection and segmentation tasks. Multi-task networks have gained prominence due to their capability to simultaneously tackle segmentation and detection tasks, while also accelerating the segmentation inference. Nevertheless, recent multi-task networks confront distinct limitations such as the difficulty in striking a balance between accuracy and inference speed. Additionally, they often overlook the integration of cross-scale features, which is especially important for biomedical image analysis. In this study, we propose an efficient end-to-end multi-task network capable of concurrently performing object detection and semantic segmentation called YOLO-Med. Our model employs a backbone and a neck for multi-scale feature extraction, complemented by the inclusion of two task-specific decoders. A cross-scale task-interaction module is employed in order to facilitate information fusion between various tasks. Our model exhibits promising results in balancing accuracy and speed when evaluated on the Kvasir-seg dataset and a private biomedical image dataset.

YOLO-MED : Multi-Task Interaction Network for Biomedical Images

TL;DR

This study proposes an efficient end-to-end multi-task network capable of concurrently performing object detection and semantic segmentation called YOLO-Med, which employs a backbone and a neck for multi-scale feature extraction and a cross-scale task-interaction module is employed in order to facilitate information fusion between various tasks.

Abstract

Object detection and semantic segmentation are pivotal components in biomedical image analysis. Current single-task networks exhibit promising outcomes in both detection and segmentation tasks. Multi-task networks have gained prominence due to their capability to simultaneously tackle segmentation and detection tasks, while also accelerating the segmentation inference. Nevertheless, recent multi-task networks confront distinct limitations such as the difficulty in striking a balance between accuracy and inference speed. Additionally, they often overlook the integration of cross-scale features, which is especially important for biomedical image analysis. In this study, we propose an efficient end-to-end multi-task network capable of concurrently performing object detection and semantic segmentation called YOLO-Med. Our model employs a backbone and a neck for multi-scale feature extraction, complemented by the inclusion of two task-specific decoders. A cross-scale task-interaction module is employed in order to facilitate information fusion between various tasks. Our model exhibits promising results in balancing accuracy and speed when evaluated on the Kvasir-seg dataset and a private biomedical image dataset.
Paper Structure (11 sections, 10 equations, 5 figures, 3 tables)

This paper contains 11 sections, 10 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison between encoder-decoder structure and our cross-scale task-interaction structure.
  • Figure 2: The architecture of YOLO-Med network. YOLO-Med shares one encoder and combines 2 decoders with a cross-scale task-interaction module to solve different tasks. The encoder consists of a backbone and a neck, and the detection head has a decoupled head module.
  • Figure 3: The architecture of (a) cross-scale task-interaction module and (b) transformer layer.
  • Figure 4: Qualitative comparison with two multi-task networks MULAN mulan and UOLO uolo on Kvasir-seg Kvasir. The detection and segmentation results are shown in the same figure.
  • Figure 5: Example correlation maps for the 4 outputs of the CSTI module. (a) depicts the correlation pattern for detecting and segmenting small objects, while (b) illustrates the scenario for large objects.