Table of Contents
Fetching ...

PIPsUS: Self-Supervised Point Tracking in Ultrasound

Wanwen Chen, Adam Schmidt, Eitan Prisman, Septimiu E Salcudean

TL;DR

US point tracking for intraoperative guidance is hindered by domain shift and scarce labeled data. The authors propose PIPsUS, a self-supervised, streaming particle-based tracker that can follow an arbitrary number of points across multiple frames in ultrasound sequences without manual labels. The method combines a history-aware feature encoder, correlation-based matching, and iterative updates, trained with a teacher-student scheme that uses PIPs++ trajectories and simulated transformations as pseudo-ground-truth. On neck/oral US and echocardiography data, PIPsUS outperforms fast NCC and fine-tuned RAFT, while enabling online, low-memory operation suitable for intraoperative use.

Abstract

Finding point-level correspondences is a fundamental problem in ultrasound (US), since it can enable US landmark tracking for intraoperative image guidance in different surgeries, including head and neck. Most existing US tracking methods, e.g., those based on optical flow or feature matching, were initially designed for RGB images before being applied to US. Therefore domain shift can impact their performance. Training could be supervised by ground-truth correspondences, but these are expensive to acquire in US. To solve these problems, we propose a self-supervised pixel-level tracking model called PIPsUS. Our model can track an arbitrary number of points in one forward pass and exploits temporal information by considering multiple, instead of just consecutive, frames. We developed a new self-supervised training strategy that utilizes a long-term point-tracking model trained for RGB images as a teacher to guide the model to learn realistic motions and use data augmentation to enforce tracking from US appearance. We evaluate our method on neck and oral US and echocardiography, showing higher point tracking accuracy when compared with fast normalized cross-correlation and tuned optical flow. Code will be available once the paper is accepted.

PIPsUS: Self-Supervised Point Tracking in Ultrasound

TL;DR

US point tracking for intraoperative guidance is hindered by domain shift and scarce labeled data. The authors propose PIPsUS, a self-supervised, streaming particle-based tracker that can follow an arbitrary number of points across multiple frames in ultrasound sequences without manual labels. The method combines a history-aware feature encoder, correlation-based matching, and iterative updates, trained with a teacher-student scheme that uses PIPs++ trajectories and simulated transformations as pseudo-ground-truth. On neck/oral US and echocardiography data, PIPsUS outperforms fast NCC and fine-tuned RAFT, while enabling online, low-memory operation suitable for intraoperative use.

Abstract

Finding point-level correspondences is a fundamental problem in ultrasound (US), since it can enable US landmark tracking for intraoperative image guidance in different surgeries, including head and neck. Most existing US tracking methods, e.g., those based on optical flow or feature matching, were initially designed for RGB images before being applied to US. Therefore domain shift can impact their performance. Training could be supervised by ground-truth correspondences, but these are expensive to acquire in US. To solve these problems, we propose a self-supervised pixel-level tracking model called PIPsUS. Our model can track an arbitrary number of points in one forward pass and exploits temporal information by considering multiple, instead of just consecutive, frames. We developed a new self-supervised training strategy that utilizes a long-term point-tracking model trained for RGB images as a teacher to guide the model to learn realistic motions and use data augmentation to enforce tracking from US appearance. We evaluate our method on neck and oral US and echocardiography, showing higher point tracking accuracy when compared with fast normalized cross-correlation and tuned optical flow. Code will be available once the paper is accepted.
Paper Structure (6 sections, 1 equation, 3 figures, 4 tables)

This paper contains 6 sections, 1 equation, 3 figures, 4 tables.

Figures (3)

  • Figure 1: PIPsUS architecture: PIPsUS enables streaming evaluation of point motion, estimating point motion at time $t$ using motion and image feature history. The model encodes history and current images and samples the features of the tracked points on history feature maps. The correlation maps of the history feature and current feature maps are concatenated with the history motion. A 1D-Resnet encodes the information and a linear layer iteratively predicts the tracking update.
  • Figure 2: L2 in different frames on real US sequence. Left: on OUS, right: on EchoNet. The line is average L2 and the shadow is 10 and 90 percentile.
  • Figure 3: Examples of tracked point trajectories in different frames on OUS (top 2 rows) and EchoNet (bottom 2 rows). The point is the current predicted keypoint locations and the colored line is the trajectory history. On OUS, in Frame 20 of PIPsUScorr and NCC, a point is correlated to a faraway location. By using point motion history, PIPsUS avoids this.