SSDD-GAN: Single-Step Denoising Diffusion GAN for Cochlear Implant Surgical Scene Completion
Yike Zhang, Eduardo Davalos, Jack Noble
TL;DR
This work tackles surgical scene completion for cochlear implant procedures by predicting complete microscopic views from partial data. The authors introduce SSDD-GAN, a self-supervised single-step denoising diffusion-GAN that combines diffusion-based generation with adversarial refinement to produce high-fidelity, semantically coherent reconstructions, trained on real surgical data and applied zero-shot to a synthetic postmastoidectomy dataset. Results show superior performance across multiple metrics compared with existing methods and demonstrate robustness across varying mask sizes, enabling realistic synthetic surgical scenes with accurate camera poses. By providing full surgical field visualization and navigation capability, the approach holds potential to improve preoperative planning, intraoperative guidance, and tool tracking in image-guided cochlear implant surgery.
Abstract
Recent deep learning-based image completion methods, including both inpainting and outpainting, have demonstrated promising results in restoring corrupted images by effectively filling various missing regions. Among these, Generative Adversarial Networks (GANs) and Denoising Diffusion Probabilistic Models (DDPMs) have been employed as key generative image completion approaches, excelling in the field of generating high-quality restorations with reduced artifacts and improved fine details. In previous work, we developed a method aimed at synthesizing views from novel microscope positions for mastoidectomy surgeries; however, that approach did not have the ability to restore the surrounding surgical scene environment. In this paper, we propose an efficient method to complete the surgical scene of the synthetic postmastoidectomy dataset. Our approach leverages self-supervised learning on real surgical datasets to train a Single-Step Denoising Diffusion-GAN (SSDD-GAN), combining the advantages of diffusion models with the adversarial optimization of GANs for improved Structural Similarity results of 6%. The trained model is then directly applied to the synthetic postmastoidectomy dataset using a zero-shot approach, enabling the generation of realistic and complete surgical scenes without the need for explicit ground-truth labels from the synthetic postmastoidectomy dataset. This method addresses key limitations in previous work, offering a novel pathway for full surgical microscopy scene completion and enhancing the usability of the synthetic postmastoidectomy dataset in surgical preoperative planning and intraoperative navigation.
