R2MF-Net: A Recurrent Residual Multi-Path Fusion Network for Robust Multi-directional Spine X-ray Segmentation
Xuecheng Li, Weikuan Jia, Komildzhon Sharipov, Sharipov Hotam Beknazarovich, Farzona S. Ataeva, Qurbonaliev Alisher, Yuanjie Zheng
TL;DR
<1-2 sentences> R2MF-Net tackles the challenge of robust spine segmentation across multi-directional X-ray views by introducing a two-stage cascaded architecture with recurrent residual skip connections (R2-Jump), multi-scale cross-stage skip (MC-Skip), and SCSE-Lite attention, all trained jointly on coronal and bending views. The method achieves higher IoU, Dice and boundary accuracy than strong baselines, demonstrates improved robustness to image quality variations, and maintains feasible computational cost for clinical use. These contributions enable more reliable automated scoliosis measurements and pave the way for streamlined, reproducible spine assessments in practice. <2-3 more sentences> The work also provides thorough ablations, analyses of robustness and complexity, and discussion of limitations and future directions toward vertebra-level modeling and multi-center validation.
Abstract
Accurate segmentation of spinal structures in X-ray images is a prerequisite for quantitative scoliosis assessment, including Cobb angle measurement, vertebral translation estimation and curvature classification. In routine practice, clinicians acquire coronal, left-bending and right-bending radiographs to jointly evaluate deformity severity and spinal flexibility. However, the segmentation step remains heavily manual, time-consuming and non-reproducible, particularly in low-contrast images and in the presence of rib shadows or overlapping tissues. To address these limitations, this paper proposes R2MF-Net, a recurrent residual multi-path encoder--decoder network tailored for automatic segmentation of multi-directional spine X-ray images. The overall design consists of a coarse segmentation network and a fine segmentation network connected in cascade. Both stages adopt an improved Inception-style multi-branch feature extractor, while a recurrent residual jump connection (R2-Jump) module is inserted into skip paths to gradually align encoder and decoder semantics. A multi-scale cross-stage skip (MC-Skip) mechanism allows the fine network to reuse hierarchical representations from multiple decoder levels of the coarse network, thereby strengthening the stability of segmentation across imaging directions and contrast conditions. Furthermore, a lightweight spatial-channel squeeze-and-excitation block (SCSE-Lite) is employed at the bottleneck to emphasize spine-related activations and suppress irrelevant structures and background noise. We evaluate R2MF-Net on a clinical multi-view radiograph dataset comprising 228 sets of coronal, left-bending and right-bending spine X-ray images with expert annotations.
