Table of Contents
Fetching ...

Leveraging Diffusion Model and Image Foundation Model for Improved Correspondence Matching in Coronary Angiography

Lin Zhao, Xin Yu, Yikang Liu, Xiao Chen, Eric Z. Chen, Terrence Chen, Shanhui Sun

TL;DR

This work tackles accurate correspondence matching in coronary angiography for 3D reconstruction by synthesizing realistic paired X-ray images through a diffusion model conditioned on 3D CCTA-derived masks and by guiding feature aggregation with large-scale image foundation models. The proposed pipeline combines a synthetic data engine, a dense correspondence framework with self- and cross-attention, and foundation-model guidance to focus on semantically relevant vessel regions and keypoints. Empirical results on both synthetic and real datasets show superior matching performance and better generalization, with ablations confirming the critical role of foundation-model guidance. The approach provides a practical, generalizable solution for camera calibration and 3D vascular reconstruction in CAD assessment, and offers insights into leveraging image foundation models for medical imaging tasks.

Abstract

Accurate correspondence matching in coronary angiography images is crucial for reconstructing 3D coronary artery structures, which is essential for precise diagnosis and treatment planning of coronary artery disease (CAD). Traditional matching methods for natural images often fail to generalize to X-ray images due to inherent differences such as lack of texture, lower contrast, and overlapping structures, compounded by insufficient training data. To address these challenges, we propose a novel pipeline that generates realistic paired coronary angiography images using a diffusion model conditioned on 2D projections of 3D reconstructed meshes from Coronary Computed Tomography Angiography (CCTA), providing high-quality synthetic data for training. Additionally, we employ large-scale image foundation models to guide feature aggregation, enhancing correspondence matching accuracy by focusing on semantically relevant regions and keypoints. Our approach demonstrates superior matching performance on synthetic datasets and effectively generalizes to real-world datasets, offering a practical solution for this task. Furthermore, our work investigates the efficacy of different foundation models in correspondence matching, providing novel insights into leveraging advanced image foundation models for medical imaging applications.

Leveraging Diffusion Model and Image Foundation Model for Improved Correspondence Matching in Coronary Angiography

TL;DR

This work tackles accurate correspondence matching in coronary angiography for 3D reconstruction by synthesizing realistic paired X-ray images through a diffusion model conditioned on 3D CCTA-derived masks and by guiding feature aggregation with large-scale image foundation models. The proposed pipeline combines a synthetic data engine, a dense correspondence framework with self- and cross-attention, and foundation-model guidance to focus on semantically relevant vessel regions and keypoints. Empirical results on both synthetic and real datasets show superior matching performance and better generalization, with ablations confirming the critical role of foundation-model guidance. The approach provides a practical, generalizable solution for camera calibration and 3D vascular reconstruction in CAD assessment, and offers insights into leveraging image foundation models for medical imaging tasks.

Abstract

Accurate correspondence matching in coronary angiography images is crucial for reconstructing 3D coronary artery structures, which is essential for precise diagnosis and treatment planning of coronary artery disease (CAD). Traditional matching methods for natural images often fail to generalize to X-ray images due to inherent differences such as lack of texture, lower contrast, and overlapping structures, compounded by insufficient training data. To address these challenges, we propose a novel pipeline that generates realistic paired coronary angiography images using a diffusion model conditioned on 2D projections of 3D reconstructed meshes from Coronary Computed Tomography Angiography (CCTA), providing high-quality synthetic data for training. Additionally, we employ large-scale image foundation models to guide feature aggregation, enhancing correspondence matching accuracy by focusing on semantically relevant regions and keypoints. Our approach demonstrates superior matching performance on synthetic datasets and effectively generalizes to real-world datasets, offering a practical solution for this task. Furthermore, our work investigates the efficacy of different foundation models in correspondence matching, providing novel insights into leveraging advanced image foundation models for medical imaging applications.

Paper Structure

This paper contains 27 sections, 16 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Synthetic data generation. The mesh reconstructed from CCTA is projected as 2D masks from different angles. X-ray images are generated by a diffusion model conditioned on these masks. Correspondence of 2D keypoints is established by linking the corresponding points in 3D space.
  • Figure 2: The proposed matching framework. The paired images are first input into image encoders to extract local descriptors (from the SuperPoint model) and global descriptors (from the foundation model). Keypoints and local descriptors are fed into $N$ consecutive feature aggregation blocks comprising self-attention and cross-attention layers. The global features are used to construct a similarity matrix and mask, guiding the cross-attention operation. This process filters out semantically irrelevant keypoints in feature aggregation, allowing the model to focus more on relevant regions and points.
  • Figure 3: Illustration of the masks and generated images from four randomly selected cases. The first row shows masks projected from the 3D mesh for the two LCA and two RCA cases, which are the condition for IDDPM model. The second row displays images generated by the trained IDDPM model.
  • Figure 4: Visualization of the matched keypoints using our method from 12 randomly selected pairs in real dataset. The matched keypoints are denoted in green dots, with correspondences indicated by green lines. The mean epipolar error for each image pair is shown below each sub-figure.
  • Figure 5: Visualization of 3-dimensional PCA applied to the features of various foundation models.