Redefining cystoscopy with ai: bladder cancer diagnosis using an efficient hybrid cnn-transformer model
Meryem Amaouche, Ouassim Karrakchou, Mounir Ghogho, Anouar El Ghazzaly, Mohamed Alami, Ahmed Ameur
TL;DR
Bladder cancer diagnosis via cystoscopy suffers from operator-dependence and missed detections. The authors present a lightweight hybrid CNN-Transformer network with a transformer bottleneck and Dual Attention Gates to achieve accurate, real-time segmentation while maintaining a small model size. They also introduce a hospital-developed cystoscopy dataset and show through ablation that combining DAGs with a single transformer block yields substantial IoU gains (IoU ≈ $85.7\%$) and Dice ≈ $92\%$ with only about $0.36$M parameters. The approach outperforms several CNN-based baselines and remains competitive with larger transformer-based models, making it suitable for real-time clinical deployment and broader accessibility in resource-constrained settings.
Abstract
Bladder cancer ranks within the top 10 most diagnosed cancers worldwide and is among the most expensive cancers to treat due to the high recurrence rates which require lifetime follow-ups. The primary tool for diagnosis is cystoscopy, which heavily relies on doctors' expertise and interpretation. Therefore, annually, numerous cases are either undiagnosed or misdiagnosed and treated as urinary infections. To address this, we suggest a deep learning approach for bladder cancer detection and segmentation which combines CNNs with a lightweight positional-encoding-free transformer and dual attention gates that fuse self and spatial attention for feature enhancement. The architecture suggested in this paper is efficient making it suitable for medical scenarios that require real time inference. Experiments have proven that this model addresses the critical need for a balance between computational efficiency and diagnostic accuracy in cystoscopic imaging as despite its small size it rivals large models in performance.
