CAF-YOLO: A Robust Framework for Multi-Scale Lesion Detection in Biomedical Imagery
Zilin Chen, Shengnan Lu
TL;DR
CAF-YOLO addresses the challenge of detecting tiny biomedical lesions by integrating CNN and transformer strengths through the CAFBlock, which comprises the Attention and Convolution Fusion Module (ACFM) and the Multi-Scale Neural Network (MSNN). The framework builds on the YOLOv8 backbone, introducing global–local feature fusion and multi-scale information aggregation to improve detection of micro-lesions in blood microscopy and pulmonary CT imagery. Experimental results on BCCD and LUNA16 show state-of-the-art performance with strong ablations highlighting the contributions of ACFM and MSNN. The work provides a practical, end-to-end framework with publicly available code to advance medical image object detection and assist clinical diagnostics.
Abstract
Object detection is of paramount importance in biomedical image analysis, particularly for lesion identification. While current methodologies are proficient in identifying and pinpointing lesions, they often lack the precision needed to detect minute biomedical entities (e.g., abnormal cells, lung nodules smaller than 3 mm), which are critical in blood and lung pathology. To address this challenge, we propose CAF-YOLO, based on the YOLOv8 architecture, a nimble yet robust method for medical object detection that leverages the strengths of convolutional neural networks (CNNs) and transformers. To overcome the limitation of convolutional kernels, which have a constrained capacity to interact with distant information, we introduce an attention and convolution fusion module (ACFM). This module enhances the modeling of both global and local features, enabling the capture of long-term feature dependencies and spatial autocorrelation. Additionally, to improve the restricted single-scale feature aggregation inherent in feed-forward networks (FFN) within transformer architectures, we design a multi-scale neural network (MSNN). This network improves multi-scale information aggregation by extracting features across diverse scales. Experimental evaluations on widely used datasets, such as BCCD and LUNA16, validate the rationale and efficacy of CAF-YOLO. This methodology excels in detecting and precisely locating diverse and intricate micro-lesions within biomedical imagery. Our codes are available at https://github.com/xiaochen925/CAF-YOLO.
