Lagrange Duality and Compound Multi-Attention Transformer for Semi-Supervised Medical Image Segmentation
Fuchen Zheng, Quanjun Li, Weixuan Li, Xuhang Chen, Yihang Dong, Guoheng Huang, Chi-Man Pun, Shoujun Zhou
TL;DR
The paper tackles the long-tail data problem in medical image segmentation by introducing CMAformer, a ResUNet–Transformer hybrid that employs cross-attention for multi-scale feature fusion and channel-aware patch embedding. It pairs this architecture with a Lagrange Duality Consistency (LDC) loss and a boundary-aware contrastive objective to harness unlabeled data in a semi-supervised setting, aiming to improve segmentation of small lesions. Key contributions include the CMAformer design with cross-attention and a theoretically grounded LDC loss formulated via Lagrangian duality and KKT conditions, along with strong empirical performance on LiTS2017 and Synapse datasets, even with reduced labeled data. The work demonstrates state-of-the-art or competitive results, improved small-object segmentation, and a practical framework for semi-supervised medical image analysis that can be extended to broader clinical tasks.
Abstract
Medical image segmentation, a critical application of semantic segmentation in healthcare, has seen significant advancements through specialized computer vision techniques. While deep learning-based medical image segmentation is essential for assisting in medical diagnosis, the lack of diverse training data causes the long-tail problem. Moreover, most previous hybrid CNN-ViT architectures have limited ability to combine various attentions in different layers of the Convolutional Neural Network. To address these issues, we propose a Lagrange Duality Consistency (LDC) Loss, integrated with Boundary-Aware Contrastive Loss, as the overall training objective for semi-supervised learning to mitigate the long-tail problem. Additionally, we introduce CMAformer, a novel network that synergizes the strengths of ResUNet and Transformer. The cross-attention block in CMAformer effectively integrates spatial attention and channel attention for multi-scale feature fusion. Overall, our results indicate that CMAformer, combined with the feature fusion framework and the new consistency loss, demonstrates strong complementarity in semi-supervised learning ensembles. We achieve state-of-the-art results on multiple public medical image datasets. Example code are available at: \url{https://github.com/lzeeorno/Lagrange-Duality-and-CMAformer}.
