Live image-based neurosurgical guidance and roadmap generation using unsupervised embedding

Gary Sarwin; Alessandro Carretta; Victor Staartjes; Matteo Zoli; Diego Mazzatenta; Luca Regli; Carlo Serra; Ender Konukoglu

Live image-based neurosurgical guidance and roadmap generation using unsupervised embedding

Gary Sarwin, Alessandro Carretta, Victor Staartjes, Matteo Zoli, Diego Mazzatenta, Luca Regli, Carlo Serra, Ender Konukoglu

TL;DR

A deep learning-based object detection method, YOLO, is reported on detecting anatomical structures in neurosurgical images and presented a method for generating neuros surgical roadmaps using unsupervised embedding without assuming exact anatomical matches between patients, presence of an extensive anatomical atlas, or the need for simultaneous localization and mapping.

Abstract

Advanced minimally invasive neurosurgery navigation relies mainly on Magnetic Resonance Imaging (MRI) guidance. MRI guidance, however, only provides pre-operative information in the majority of the cases. Once the surgery begins, the value of this guidance diminishes to some extent because of the anatomical changes due to surgery. Guidance with live image feedback coming directly from the surgical device, e.g., endoscope, can complement MRI-based navigation or be an alternative if MRI guidance is not feasible. With this motivation, we present a method for live image-only guidance leveraging a large data set of annotated neurosurgical videos.First, we report the performance of a deep learning-based object detection method, YOLO, on detecting anatomical structures in neurosurgical images. Second, we present a method for generating neurosurgical roadmaps using unsupervised embedding without assuming exact anatomical matches between patients, presence of an extensive anatomical atlas, or the need for simultaneous localization and mapping. A generated roadmap encodes the common anatomical paths taken in surgeries in the training set. At inference, the roadmap can be used to map a surgeon's current location using live image feedback on the path to provide guidance by being able to predict which structures should appear going forward or backward, much like a mapping application. Even though the embedding is not supervised by position information, we show that it is correlated to the location inside the brain and on the surgical path. We trained and evaluated the proposed method with a data set of 166 transsphenoidal adenomectomy procedures.

Live image-based neurosurgical guidance and roadmap generation using unsupervised embedding

TL;DR

Abstract

Paper Structure (13 sections, 1 equation, 6 figures, 1 table)

This paper contains 13 sections, 1 equation, 6 figures, 1 table.

Introduction
Methods
Problem Formulation and Approach
Object Detection
Embedding
Experiments and Results
Dataset
Implementation Details
Results
Anatomical Structure Detection:
Qualitative Assessment of the Embedding:
Quantitative Assessment of the Embedding:
Conclusion

Figures (6)

Figure 1: Simplified representation of the suggested approach. 1. A sequence of input images is processed to detect bounding boxes of anatomical structures. 2. A neural network encodes the sequence of detections into a latent variable that correlates with the position along the surgical path. 3. Given the current position along the surgical path, an estimation of anatomical structures in the forward or backward directions can be obtained, by extrapolating the current value of the latent variable.
Figure 2: Left: Transsphenoidal adenomectomy procedure is performed to remove a tumor from the pituitary gland, located at the base of the brain. Through the use of an endoscope and various instruments, the surgeon inserts the instruments into the nostril and crosses the sphenoidal sinus to access the pituitary gland located behind the sella floor. Right: A video frame showing only the anatomy. Note that there is lack of clear differences between anatomical structures in such images.
Figure 3: The model architecture. The model consists of an encoder and two decoders. The encoder consists of a multi-head attention layer, i.e., a transformer encoder, which takes $\mathbf{C}_t$ as input, followed by a series of fully connected layers to embed the input in a 1D latent dimension. The two decoders consist of fully connected layers to generate the class probabilities $\hat{\mathbf{y}}_t$ and the bounding box coordinates $\hat{\mathbf{b}}_t$, respectively.
Figure 4: The normalized generated confidences of each class along the latent space. This visualizes the probability of finding a certain anatomical structure at a specific point in the latent space. Additionally, video frames of twenty test videos responsible for the first appearances of anatomical structures in every video have been encoded and overlaid onto the confidence intervals to demonstrate that their locations correlate with the beginning of these intervals.
Figure 5: Z-values over time during a surgical video. Certain $z$-values are encoded more frequently than others, such as approximately $z=0.2$ and $z=0.6$, which is related to the amount of time spent at a certain location during the surgery.
...and 1 more figures

Live image-based neurosurgical guidance and roadmap generation using unsupervised embedding

TL;DR

Abstract

Live image-based neurosurgical guidance and roadmap generation using unsupervised embedding

Authors

TL;DR

Abstract

Table of Contents

Figures (6)