Table of Contents
Fetching ...

Image Retrieval with Intra-Sweep Representation Learning for Neck Ultrasound Scanning Guidance

Wanwen Chen, Adam Schmidt, Eitan Prisman, Septimiu E. Salcudean

TL;DR

The paper tackles real-time neck ultrasound guidance during transoral robotic surgery by framing it as an image retrieval problem against a preoperative US database. It introduces a self-supervised intra-sweep contrastive learning approach that leverages semantic similarity within ultrasound sweeps and US probe location to learn robust representations, including a learnable dustbin threshold to reject uncertain matches and a triplet loss to refine embeddings. On simulated data, it achieves 92.30% retrieval accuracy and outperforms baselines, with ablations confirming the value of probe-location cues. A pilot demonstration on real patient data suggests feasibility of localizing the US probe using image retrieval despite tongue retraction-induced tissue deformation, pointing toward practical, tracking-free intraoperative guidance improvements.

Abstract

Purpose: Intraoperative ultrasound (US) can enhance real-time visualization in transoral robotic surgery. The surgeon creates a mental map with a pre-operative scan. Then, a surgical assistant performs freehand US scanning during the surgery while the surgeon operates at the remote surgical console. Communicating the target scanning plane in the surgeon's mental map is difficult. Automatic image retrieval can help match intraoperative images to preoperative scans, guiding the assistant to adjust the US probe toward the target plane. Methods: We propose a self-supervised contrastive learning approach to match intraoperative US views to a preoperative image database. We introduce a novel contrastive learning strategy that leverages intra-sweep similarity and US probe location to improve feature encoding. Additionally, our model incorporates a flexible threshold to reject unsatisfactory matches. Results: Our method achieves 92.30% retrieval accuracy on simulated data and outperforms state-of-the-art temporal-based contrastive learning approaches. Our ablation study demonstrates that using probe location in the optimization goal improves image representation, suggesting that semantic information can be extracted from probe location. We also present our approach on real patient data to show the feasibility of the proposed US probe localization system despite tissue deformation from tongue retraction. Conclusion: Our contrastive learning method, which utilizes intra-sweep similarity and US probe location, enhances US image representation learning. We also demonstrate the feasibility of using our image retrieval method to provide neck US localization on real patient US after tongue retraction.

Image Retrieval with Intra-Sweep Representation Learning for Neck Ultrasound Scanning Guidance

TL;DR

The paper tackles real-time neck ultrasound guidance during transoral robotic surgery by framing it as an image retrieval problem against a preoperative US database. It introduces a self-supervised intra-sweep contrastive learning approach that leverages semantic similarity within ultrasound sweeps and US probe location to learn robust representations, including a learnable dustbin threshold to reject uncertain matches and a triplet loss to refine embeddings. On simulated data, it achieves 92.30% retrieval accuracy and outperforms baselines, with ablations confirming the value of probe-location cues. A pilot demonstration on real patient data suggests feasibility of localizing the US probe using image retrieval despite tongue retraction-induced tissue deformation, pointing toward practical, tracking-free intraoperative guidance improvements.

Abstract

Purpose: Intraoperative ultrasound (US) can enhance real-time visualization in transoral robotic surgery. The surgeon creates a mental map with a pre-operative scan. Then, a surgical assistant performs freehand US scanning during the surgery while the surgeon operates at the remote surgical console. Communicating the target scanning plane in the surgeon's mental map is difficult. Automatic image retrieval can help match intraoperative images to preoperative scans, guiding the assistant to adjust the US probe toward the target plane. Methods: We propose a self-supervised contrastive learning approach to match intraoperative US views to a preoperative image database. We introduce a novel contrastive learning strategy that leverages intra-sweep similarity and US probe location to improve feature encoding. Additionally, our model incorporates a flexible threshold to reject unsatisfactory matches. Results: Our method achieves 92.30% retrieval accuracy on simulated data and outperforms state-of-the-art temporal-based contrastive learning approaches. Our ablation study demonstrates that using probe location in the optimization goal improves image representation, suggesting that semantic information can be extracted from probe location. We also present our approach on real patient data to show the feasibility of the proposed US probe localization system despite tissue deformation from tongue retraction. Conclusion: Our contrastive learning method, which utilizes intra-sweep similarity and US probe location, enhances US image representation learning. We also demonstrate the feasibility of using our image retrieval method to provide neck US localization on real patient US after tongue retraction.

Paper Structure

This paper contains 5 sections, 3 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The proposed workflow of using image retrieval in US scanning guidance. The surgeon can select the target view, and our method will match the current US image to the most similar view in the database to indicate the current probe location and the desired probe motion. This can provide the US scanning guidance to the surgical assistant, without using an external tracking system during the surgery.
  • Figure 2: Summary of our intra-sweep training strategy. The image pairs are sampled from one US sweep and augmented to different views, and the image encoder will predict the frame embedding. The dot product of the embedding is used to evaluate the embedding similarity. A dustbin threshold is concatenated to the dot similarity to generate the final score matrix.
  • Figure 3: Example of the retrieved frames using our proposed method. The first four columns show correct retrievals and the last three columns show inaccurate matches.
  • Figure 4: Queries are samples from the post-retraction US, and the database is the pre-retraction US. The blue trajectory is the scanning trajectory in the pre-retraction scan, and the red dot is the localized probe location based on the image retrieval.
  • Figure 5: Illustration of the TORS guidance application. The US image is after tongue retraction, and the 3D patient model is extracted from pre-operative CT and aligned with the pre-operative 3D US. The segmentation (yellow: larynx cartilage, green: carotid, red: jugular vein) is roughly aligned with the 2D US.