Table of Contents
Fetching ...

Intraoperative 2D/3D Registration via Spherical Similarity Learning and Differentiable Levenberg-Marquardt Optimization

Minheng Chen, Youyong Kong

TL;DR

This work extracts feature embeddings using a CNN-Transformer encoder, project them into spherical space, and approximate their geodesic distances with Riemannian distances in the bi-invariant SO(4) space to enable a more expressive and geometrically consistent deep similarity metric.

Abstract

Intraoperative 2D/3D registration aligns preoperative 3D volumes with real-time 2D radiographs, enabling accurate localization of instruments and implants. A recent fully differentiable similarity learning framework approximates geodesic distances on SE(3), expanding the capture range of registration and mitigating the effects of substantial disturbances, but existing Euclidean approximations distort manifold structure and slow convergence. To address these limitations, we explore similarity learning in non-Euclidean spherical feature spaces to better capture and fit complex manifold structure. We extract feature embeddings using a CNN-Transformer encoder, project them into spherical space, and approximate their geodesic distances with Riemannian distances in the bi-invariant SO(4) space. This enables a more expressive and geometrically consistent deep similarity metric, enhancing the ability to distinguish subtle pose differences. During inference, we replace gradient descent with fully differentiable Levenberg-Marquardt optimization to accelerate convergence. Experiments on real and synthetic datasets show superior accuracy in both patient-specific and patient-agnostic scenarios.

Intraoperative 2D/3D Registration via Spherical Similarity Learning and Differentiable Levenberg-Marquardt Optimization

TL;DR

This work extracts feature embeddings using a CNN-Transformer encoder, project them into spherical space, and approximate their geodesic distances with Riemannian distances in the bi-invariant SO(4) space to enable a more expressive and geometrically consistent deep similarity metric.

Abstract

Intraoperative 2D/3D registration aligns preoperative 3D volumes with real-time 2D radiographs, enabling accurate localization of instruments and implants. A recent fully differentiable similarity learning framework approximates geodesic distances on SE(3), expanding the capture range of registration and mitigating the effects of substantial disturbances, but existing Euclidean approximations distort manifold structure and slow convergence. To address these limitations, we explore similarity learning in non-Euclidean spherical feature spaces to better capture and fit complex manifold structure. We extract feature embeddings using a CNN-Transformer encoder, project them into spherical space, and approximate their geodesic distances with Riemannian distances in the bi-invariant SO(4) space. This enables a more expressive and geometrically consistent deep similarity metric, enhancing the ability to distinguish subtle pose differences. During inference, we replace gradient descent with fully differentiable Levenberg-Marquardt optimization to accelerate convergence. Experiments on real and synthetic datasets show superior accuracy in both patient-specific and patient-agnostic scenarios.

Paper Structure

This paper contains 18 sections, 17 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overview of the proposed framework. We first employ a regressor to initialize the pose and then refine it using differentiable Levenberg-Marquardt optimization based on spherical similarity learning. Spherical similarity learning consists of two main components: extracting image feature representations using CNN-Transformer encoders $\xi(\cdot)$ and projecting these embeddings into hypersphere space, where the geodesic distance between them is computed as a measure of deep similarity. During training, we enforce the gradient of this deep similarity with respect to $\theta$ to approximate the gradient of the geodesic distance between $\theta$ and the ground truth $\theta_{gt}$ in SE(3).
  • Figure 2: Qualitative results. We visualize the registration results of the proposed method and the baselines on the DeepFluoro and Ljubljana dataset under the patient-specific scenario. In each example, the top row shows the ground truth X-ray image alongside the corresponding DRR generated from the pose estimated by each method. The bottom row displays the difference map, demonstrating the alignment between the DRR at the ground truth pose (red) and the DRR at the estimated pose (blue).
  • Figure 3: Visual comparison of the proposed spherical deep similarity landscape in $\mathfrak{se}(3)$ (top) and SO(4) (bottom). For clearer visualization, the deep similarity values are first normalized to the range [0,1], and then transformed by computing 1-$\epsilon$, effectively inverting the scale to enhance contrast in the display.
  • Figure 4: Left: comparison of training time and mTRE for position regressors with different backbone architectures.The training time reports the time when the standard deviation of the model's loss function is less than 10e-4 in the last ten epochs. Right: comparison of the convergence speed of the proposed framework using different gradient-based optimization methods in the inference phase.