Table of Contents
Fetching ...

Catheter Detection and Segmentation in X-ray Images via Multi-task Learning

Lin Xi, Yingliang Ma, Ethan Koland, Sandra Howell, Aldo Rinaldi, Kawal S. Rhode

TL;DR

This work tackles automated, real-time detection and segmentation of catheters in X-ray fluoroscopy for cardiac interventions. It introduces a unified encoder–decoder CNN based on a ResNet34 backbone with an attention module and three prediction heads, coupled with a multi-level dynamic resource prioritization strategy that allocates learning resources across samples and tasks using KPI-driven weights. The approach achieves state-of-the-art performance for both detection and segmentation on public and private datasets, while delivering real-time inference at 37 FPS. This combination of accuracy and efficiency supports improved image guidance, potential motion compensation, and integration into robotic or semi-automated surgical workflows.

Abstract

Automated detection and segmentation of surgical devices, such as catheters or wires, in X-ray fluoroscopic images have the potential to enhance image guidance in minimally invasive heart surgeries. In this paper, we present a convolutional neural network model that integrates a resnet architecture with multiple prediction heads to achieve real-time, accurate localization of electrodes on catheters and catheter segmentation in an end-to-end deep learning framework. We also propose a multi-task learning strategy in which our model is trained to perform both accurate electrode detection and catheter segmentation simultaneously. A key challenge with this approach is achieving optimal performance for both tasks. To address this, we introduce a novel multi-level dynamic resource prioritization method. This method dynamically adjusts sample and task weights during training to effectively prioritize more challenging tasks, where task difficulty is inversely proportional to performance and evolves throughout the training process. Experiments on both public and private datasets have demonstrated that the accuracy of our method surpasses the existing state-of-the-art methods in both single segmentation task and in the detection and segmentation multi-task. Our approach achieves a good trade-off between accuracy and efficiency, making it well-suited for real-time surgical guidance applications.

Catheter Detection and Segmentation in X-ray Images via Multi-task Learning

TL;DR

This work tackles automated, real-time detection and segmentation of catheters in X-ray fluoroscopy for cardiac interventions. It introduces a unified encoder–decoder CNN based on a ResNet34 backbone with an attention module and three prediction heads, coupled with a multi-level dynamic resource prioritization strategy that allocates learning resources across samples and tasks using KPI-driven weights. The approach achieves state-of-the-art performance for both detection and segmentation on public and private datasets, while delivering real-time inference at 37 FPS. This combination of accuracy and efficiency supports improved image guidance, potential motion compensation, and integration into robotic or semi-automated surgical workflows.

Abstract

Automated detection and segmentation of surgical devices, such as catheters or wires, in X-ray fluoroscopic images have the potential to enhance image guidance in minimally invasive heart surgeries. In this paper, we present a convolutional neural network model that integrates a resnet architecture with multiple prediction heads to achieve real-time, accurate localization of electrodes on catheters and catheter segmentation in an end-to-end deep learning framework. We also propose a multi-task learning strategy in which our model is trained to perform both accurate electrode detection and catheter segmentation simultaneously. A key challenge with this approach is achieving optimal performance for both tasks. To address this, we introduce a novel multi-level dynamic resource prioritization method. This method dynamically adjusts sample and task weights during training to effectively prioritize more challenging tasks, where task difficulty is inversely proportional to performance and evolves throughout the training process. Experiments on both public and private datasets have demonstrated that the accuracy of our method surpasses the existing state-of-the-art methods in both single segmentation task and in the detection and segmentation multi-task. Our approach achieves a good trade-off between accuracy and efficiency, making it well-suited for real-time surgical guidance applications.

Paper Structure

This paper contains 10 sections, 8 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: The overview of our model. Given an X-ray image $\bm{I}$, we utilize an encoder to embed it into a 512-dimensional embedding feature $\bm{F}$ and then fed into an attention module to produce enhanced feature $\bm{F}'$. The enhanced features $\bm{F}'$ further fed into a decoder to recover resolution via a top-down manner with a skip connection. Finally, the output of the top-down decoder is passed to segmentation, center, and size prediction heads to obtain final results.
  • Figure 2: Qualitative results of the proposed method on challenging scenarios from the catheter detection and segmentation dataset. The green crosses are the positions of electrodes. The orange mask indicates the segmentation results of the catheter.
  • Figure 3: Difficulty change curve of the selected samples during the training process. The x-axis represents the training iterations, and the y-axis represents the difficulty of the samples. The blue, orange, and green lines represent the easy, medium, and hard samples, respectively. The red mask represents the top-k filtering threshold, which retains 70% of the samples. (a) The difficulty curve of the samples without momentum update. (b) The difficulty curve of the samples with momentum update.