RWKV-UNet: Improving UNet with Long-Range Cooperation for Effective Medical Image Segmentation
Juntao Jiang, Jiangning Zhang, Weixuan Liu, Muxuan Gao, Xiaobin Hu, Zhucun Xue, Yong Liu, Shuicheng Yan
TL;DR
RWKV-UNet addresses the challenge of capturing long-range dependencies in medical image segmentation without incurring the high costs of full self-attention. It integrates the Receptance Weighted Key Value (RWKV) mechanism into a U-Net via Global-Local Spatial Perception (GLSP) blocks and Cross-Channel Mix (CCM) skip connections, pairing a robust encoder with a large-kernel decoder. The approach includes pre-trained encoders and scalable variants (Enc-T/S/B; RWKV-UNet-S/T) to balance accuracy and efficiency, and demonstrates state-of-the-art performance across 11 diverse medical imaging datasets. While highly effective in 2D segmentation, future work will extend to 3D volumes and ultra-lightweight RWKV configurations for broader clinical applicability.
Abstract
In recent years, significant advancements have been made in deep learning for medical image segmentation, particularly with convolutional neural networks (CNNs) and transformer models. However, CNNs face limitations in capturing long-range dependencies, while transformers suffer from high computational complexity. To address this, we propose RWKV-UNet, a novel model that integrates the RWKV (Receptance Weighted Key Value) structure into the U-Net architecture. This integration enhances the model's ability to capture long-range dependencies and to improve contextual understanding, which is crucial for accurate medical image segmentation. We build a strong encoder with developed Global-Local Spatial Perception (GLSP) blocks combining CNNs and RWKVs. We also propose a Cross-Channel Mix (CCM) module to improve skip connections with multi-scale feature fusion, achieving global channel information integration. Experiments on 11 benchmark datasets show that the RWKV-UNet achieves state-of-the-art performance on various types of medical image segmentation tasks. Additionally, smaller variants, RWKV-UNet-S and RWKV-UNet-T, balance accuracy and computational efficiency, making them suitable for broader clinical applications.
