Table of Contents
Fetching ...

BiSe-Unet: A Lightweight Dual-path U-Net with Attention-refined Context for Real-time Medical Image Segmentation

M Iffat Hossain, Laura Brattain

TL;DR

BiSe-UNet, a lightweight dual-path U-Net that integrates an attention-refined context path with a shallow spatial path for detailed feature preservation, followed by a depthwise separable decoder for efficient reconstruction, is introduced, demonstrating its effectiveness for accurate, lightweight, and deployable medical image segmentation on edge hardware.

Abstract

During image-guided procedures, real-time image segmentation is often required. This demands lightweight AI models that can operate on resource-constrained devices. One important use case is endoscopy-guided colonoscopy, where polyps must be detected in real time. The Kvasir-Seg dataset, a publicly available benchmark for this task, contains 1,000 high-resolution endoscopic images of polyps with corresponding pixel-level segmentation masks. Achieving real-time inference speed for clinical deployment in constrained environments requires highly efficient and lightweight network architectures. However, many existing models remain too computationally intensive for embedded deployment. Lightweight architectures, although faster, often suffer from reduced spatial precision and weaker contextual understanding, leading to degraded boundary quality and reduced diagnostic reliability. To address these challenges, we introduce BiSe-UNet, a lightweight dual-path U-Net that integrates an attention-refined context path with a shallow spatial path for detailed feature preservation, followed by a depthwise separable decoder for efficient reconstruction. Evaluated on the Kvasir-Seg dataset, BiSe-UNet achieves competitive Dice and IoU scores while sustaining real-time throughput exceeding 30 FPS on Raspberry Pi 5, demonstrating its effectiveness for accurate, lightweight, and deployable medical image segmentation on edge hardware.

BiSe-Unet: A Lightweight Dual-path U-Net with Attention-refined Context for Real-time Medical Image Segmentation

TL;DR

BiSe-UNet, a lightweight dual-path U-Net that integrates an attention-refined context path with a shallow spatial path for detailed feature preservation, followed by a depthwise separable decoder for efficient reconstruction, is introduced, demonstrating its effectiveness for accurate, lightweight, and deployable medical image segmentation on edge hardware.

Abstract

During image-guided procedures, real-time image segmentation is often required. This demands lightweight AI models that can operate on resource-constrained devices. One important use case is endoscopy-guided colonoscopy, where polyps must be detected in real time. The Kvasir-Seg dataset, a publicly available benchmark for this task, contains 1,000 high-resolution endoscopic images of polyps with corresponding pixel-level segmentation masks. Achieving real-time inference speed for clinical deployment in constrained environments requires highly efficient and lightweight network architectures. However, many existing models remain too computationally intensive for embedded deployment. Lightweight architectures, although faster, often suffer from reduced spatial precision and weaker contextual understanding, leading to degraded boundary quality and reduced diagnostic reliability. To address these challenges, we introduce BiSe-UNet, a lightweight dual-path U-Net that integrates an attention-refined context path with a shallow spatial path for detailed feature preservation, followed by a depthwise separable decoder for efficient reconstruction. Evaluated on the Kvasir-Seg dataset, BiSe-UNet achieves competitive Dice and IoU scores while sustaining real-time throughput exceeding 30 FPS on Raspberry Pi 5, demonstrating its effectiveness for accurate, lightweight, and deployable medical image segmentation on edge hardware.
Paper Structure (9 sections, 1 equation, 3 figures, 2 tables)

This paper contains 9 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The workflow shows BiSe-UNet extracting spatial and contextual features in two paths, merging them once for efficient integration, and decoding with a lightweight DSConv module to retain boundaries with minimal computation.
  • Figure 2: Polyp Segmentation Performance Comparison - BiSe-UNet against popular existing segmentation models
  • Figure 3: Ablation of BiSe-UNet tested on Raspberry Pi 5. The square (DSConv encoder) improves speed but lowers Dice Score compared to the base model (circle); the diamond (DSConv in both encoder and decoder) maintains high accuracy, while the triangle (final BiSe-UNet with custom skip connections) results in good speed and Dice score.