Table of Contents
Fetching ...

Fully Differentiable Correlation-driven 2D/3D Registration for X-ray to CT Image Fusion

Minheng Chen, Zhirun Zhang, Shuheng Gu, Zhangyang Ge, Youyong Kong

TL;DR

This work tackles rigid 2D/3D X-ray to CT registration by addressing interpretability, controllability, and limited capture range in fully differentiable methods. It introduces a correlation-driven network with a dual-branch CNN-Transformer encoder that decouples low-frequency global and high-frequency local features, coupled with a correlation-based decomposition loss and a training strategy that approximates a convex similarity function. The framework predicts a relative SE(3) pose and performs gradient-based iterative refinement, guided by a loss that combines $L_{appro}$ and $L_{decomp}$ with learnable uncertainty parameters and NCC-based feature decomposition. Evaluations on a spine-focused, simulated X-ray dataset demonstrate improved accuracy and robustness over CMA-ES baselines and existing differentiable methods, suggesting enhanced interpretability and capture range for clinical image-guided interventions. The approach has potential to improve real-time registration performance and reliability in fluoroscopic procedures.

Abstract

Image-based rigid 2D/3D registration is a critical technique for fluoroscopic guided surgical interventions. In recent years, some learning-based fully differentiable methods have produced beneficial outcomes while the process of feature extraction and gradient flow transmission still lack controllability and interpretability. To alleviate these problems, in this work, we propose a novel fully differentiable correlation-driven network using a dual-branch CNN-transformer encoder which enables the network to extract and separate low-frequency global features from high-frequency local features. A correlation-driven loss is further proposed for low-frequency feature and high-frequency feature decomposition based on embedded information. Besides, a training strategy that learns to approximate a convex-shape similarity function is applied in our work. We test our approach on a in-house datasetand show that it outperforms both existing fully differentiable learning-based registration approaches and the conventional optimization-based baseline.

Fully Differentiable Correlation-driven 2D/3D Registration for X-ray to CT Image Fusion

TL;DR

This work tackles rigid 2D/3D X-ray to CT registration by addressing interpretability, controllability, and limited capture range in fully differentiable methods. It introduces a correlation-driven network with a dual-branch CNN-Transformer encoder that decouples low-frequency global and high-frequency local features, coupled with a correlation-based decomposition loss and a training strategy that approximates a convex similarity function. The framework predicts a relative SE(3) pose and performs gradient-based iterative refinement, guided by a loss that combines and with learnable uncertainty parameters and NCC-based feature decomposition. Evaluations on a spine-focused, simulated X-ray dataset demonstrate improved accuracy and robustness over CMA-ES baselines and existing differentiable methods, suggesting enhanced interpretability and capture range for clinical image-guided interventions. The approach has potential to improve real-time registration performance and reliability in fluoroscopic procedures.

Abstract

Image-based rigid 2D/3D registration is a critical technique for fluoroscopic guided surgical interventions. In recent years, some learning-based fully differentiable methods have produced beneficial outcomes while the process of feature extraction and gradient flow transmission still lack controllability and interpretability. To alleviate these problems, in this work, we propose a novel fully differentiable correlation-driven network using a dual-branch CNN-transformer encoder which enables the network to extract and separate low-frequency global features from high-frequency local features. A correlation-driven loss is further proposed for low-frequency feature and high-frequency feature decomposition based on embedded information. Besides, a training strategy that learns to approximate a convex-shape similarity function is applied in our work. We test our approach on a in-house datasetand show that it outperforms both existing fully differentiable learning-based registration approaches and the conventional optimization-based baseline.
Paper Structure (12 sections, 7 equations, 2 figures, 1 table)

This paper contains 12 sections, 7 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Overall architecture of our Method. (a) It includes the architecture of the proposed framework, which is trained to predict a relative SE(3) transformation that can be applied to an iterative 2D/3D registration. (b) The structure of the encoder consists of three components: the shallow share feature encoder(SFE), the global-local feature decomposition(GLD) layer and the similarity evaluation (SE) layer.
  • Figure 2: Quantitative results on a test dataset using our proposed method. Each column in the figures represents: (a) fixed images (b) overlay images of initial pose (c) overlay results after applying the proposed method (d) visualization results after employing the proposed method and CMA-ES. The overlay images are created by superimposing the fixed images with the DRR-derived edges highlighted in green.