Table of Contents
Fetching ...

CAF-YOLO: A Robust Framework for Multi-Scale Lesion Detection in Biomedical Imagery

Zilin Chen, Shengnan Lu

TL;DR

CAF-YOLO addresses the challenge of detecting tiny biomedical lesions by integrating CNN and transformer strengths through the CAFBlock, which comprises the Attention and Convolution Fusion Module (ACFM) and the Multi-Scale Neural Network (MSNN). The framework builds on the YOLOv8 backbone, introducing global–local feature fusion and multi-scale information aggregation to improve detection of micro-lesions in blood microscopy and pulmonary CT imagery. Experimental results on BCCD and LUNA16 show state-of-the-art performance with strong ablations highlighting the contributions of ACFM and MSNN. The work provides a practical, end-to-end framework with publicly available code to advance medical image object detection and assist clinical diagnostics.

Abstract

Object detection is of paramount importance in biomedical image analysis, particularly for lesion identification. While current methodologies are proficient in identifying and pinpointing lesions, they often lack the precision needed to detect minute biomedical entities (e.g., abnormal cells, lung nodules smaller than 3 mm), which are critical in blood and lung pathology. To address this challenge, we propose CAF-YOLO, based on the YOLOv8 architecture, a nimble yet robust method for medical object detection that leverages the strengths of convolutional neural networks (CNNs) and transformers. To overcome the limitation of convolutional kernels, which have a constrained capacity to interact with distant information, we introduce an attention and convolution fusion module (ACFM). This module enhances the modeling of both global and local features, enabling the capture of long-term feature dependencies and spatial autocorrelation. Additionally, to improve the restricted single-scale feature aggregation inherent in feed-forward networks (FFN) within transformer architectures, we design a multi-scale neural network (MSNN). This network improves multi-scale information aggregation by extracting features across diverse scales. Experimental evaluations on widely used datasets, such as BCCD and LUNA16, validate the rationale and efficacy of CAF-YOLO. This methodology excels in detecting and precisely locating diverse and intricate micro-lesions within biomedical imagery. Our codes are available at https://github.com/xiaochen925/CAF-YOLO.

CAF-YOLO: A Robust Framework for Multi-Scale Lesion Detection in Biomedical Imagery

TL;DR

CAF-YOLO addresses the challenge of detecting tiny biomedical lesions by integrating CNN and transformer strengths through the CAFBlock, which comprises the Attention and Convolution Fusion Module (ACFM) and the Multi-Scale Neural Network (MSNN). The framework builds on the YOLOv8 backbone, introducing global–local feature fusion and multi-scale information aggregation to improve detection of micro-lesions in blood microscopy and pulmonary CT imagery. Experimental results on BCCD and LUNA16 show state-of-the-art performance with strong ablations highlighting the contributions of ACFM and MSNN. The work provides a practical, end-to-end framework with publicly available code to advance medical image object detection and assist clinical diagnostics.

Abstract

Object detection is of paramount importance in biomedical image analysis, particularly for lesion identification. While current methodologies are proficient in identifying and pinpointing lesions, they often lack the precision needed to detect minute biomedical entities (e.g., abnormal cells, lung nodules smaller than 3 mm), which are critical in blood and lung pathology. To address this challenge, we propose CAF-YOLO, based on the YOLOv8 architecture, a nimble yet robust method for medical object detection that leverages the strengths of convolutional neural networks (CNNs) and transformers. To overcome the limitation of convolutional kernels, which have a constrained capacity to interact with distant information, we introduce an attention and convolution fusion module (ACFM). This module enhances the modeling of both global and local features, enabling the capture of long-term feature dependencies and spatial autocorrelation. Additionally, to improve the restricted single-scale feature aggregation inherent in feed-forward networks (FFN) within transformer architectures, we design a multi-scale neural network (MSNN). This network improves multi-scale information aggregation by extracting features across diverse scales. Experimental evaluations on widely used datasets, such as BCCD and LUNA16, validate the rationale and efficacy of CAF-YOLO. This methodology excels in detecting and precisely locating diverse and intricate micro-lesions within biomedical imagery. Our codes are available at https://github.com/xiaochen925/CAF-YOLO.
Paper Structure (18 sections, 5 equations, 5 figures, 3 tables)

This paper contains 18 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Examples of platelets that are difficult to visually discern in blurry blood microscopic images.
  • Figure 2: Illustration of the overall framework of our proposed CAF-YOLO for biomedical image detection. The internal structure of the CAFBlock is illustrated at the top, comprising two layers of layer normalization, an attention-based convolutional fusion module (ACFM), and a multi-scale neural network (MSNN). We introduce a CAFBlock after the backbone network to facilitate the extraction of both global and local feature representations.
  • Figure 3: Illustration of the proposed attention and convolution fusion module comprising local and global branches. Within the local branch, convolutional operations and channel shuffling are applied to facilitate the extraction of localized features. Conversely, an attention mechanism is employed within the global branch to model and encapsulate long-range feature dependencies effectively.
  • Figure 4: Illustration of the multi-scale neural network (MSNN). In the lower pathway, depthwise convolution is harnessed to facilitate feature extraction, operating with a focus on spatial intricacies. Conversely, the upper pathway utilizes multi-scale dilated convolutions to achieve feature extraction across multiple scales, enabling the capture of diverse contextual information at varying granularities.
  • Figure 5: The validation set performance comparison of CAF-YOLO on BCCD dataset can be directly inferred that CAF-YOLO can successfully detect positive instances of all classes with satisfactory coverage. The image highlights erythrocytes (pink boxes), leukocytes (red boxes), and platelets (orange boxes) with ground truth annotations in green boxes.