Self-supervised Mamba-based Mastoidectomy Shape Prediction for Cochlear Implant Surgery
Yike Zhang, Eduardo Davalos, Dingjie Su, Ange Lou, Jack H. Noble
TL;DR
The paper tackles the challenge of predicting the mastoidectomy-removed region required for cochlear implant surgery directly from preoperative CT scans, avoiding manual labeling. It introduces a self-supervised SegMamba-based network with a pretrained SAM-Med3D encoder to learn an inverted removal map $\delta$ from preoperative CT $\rho$ via $f_\theta(\rho) = \delta$, producing a post-mastoidectomy volume $\rho \otimes \delta$ and enabling 3D surface reconstruction by isosurfacing. Training relies on noisy postoperative CT data and novel loss functions $L_{msssim\_cscc}$ and $L_{smooth}$, where $L_{msssim\_cscc}$ combines MS-SSIM with Squared Cross-Correlation across $M=5$ scales to robustly compare $\rho \otimes \delta$ with $\omega$, and $L_{smooth}$ enforces spatial smoothness on $\delta$; $L_{smooth}(\delta) = \sum_{i=1}^{N} \|\nabla(\delta_i)\|^2$. The approach is evaluated on 751 preop/postop CT pairs (630 train/val, 121 test) with 32 ground-truth labels, achieving a mean Dice of $0.70$ and outperforming several Transformer/UNet baselines, demonstrating practical potential for preoperative planning and downstream tasks such as tool tracking and surgical scene understanding. The work reduces labeling requirements and provides a pathway to realistic, texture-enhanced multi-view surgical visualizations in CI procedures.
Abstract
Cochlear Implant (CI) procedures require the insertion of an electrode array into the cochlea within the inner ear. To achieve this, mastoidectomy, a surgical procedure involving the removal of part of the mastoid region of the temporal bone using a high-speed drill provides safe access to the cochlea through the middle and inner ear. In this paper, we propose a novel Mamba-based method to synthesize the mastoidectomy volume using only preoperative Computed Tomography (CT) scans, where the mastoid remains intact. Our approach introduces a self-supervised learning framework designed to predict the mastoidectomy shape and reconstruct a 3D post-mastoidectomy surface directly from preoperative CT scans. This reconstruction aligns with intraoperative microscope views, enabling various downstream surgical applications. For training, we leverage postoperative CT scans to bypass manual data cleaning and labeling, even when the region removed during mastoidectomy is affected by challenges such as metal artifacts, low signal-to-noise ratio, or electrode wiring. Our method achieves a mean Dice score of 0.70 in estimating mastoidectomy regions, demonstrating its effectiveness for accurate and efficient surgical preoperative planning.
