Table of Contents
Fetching ...

HeartFormer: Semantic-Aware Dual-Structure Transformers for 3D Four-Chamber Cardiac Point Cloud Reconstruction

Zhengda Ma, Abhirup Banerjee

TL;DR

This work introduces HeartFormer, the first geometric deep learning framework for multi-class 3D four-chamber cardiac reconstruction from cine MRI using point clouds. It combines a Semantic-Aware Dual-Structure Transformer (SA-DSTNet) for coarse semantic geometry with a two-stage Semantic-Aware Geometry-Feature Refinement Transformer (SA-GFRTNet) for progressive, anatomy-guided refinement, supervised by Semantic-Aware Chamfer Distance (SA-CD). A new large-scale HeartCompv1 dataset (17,000 samples) enables robust cross-domain evaluation alongside UK Biobank data, showing HeartFormer consistently outperforms state-of-the-art methods in geometric fidelity and anatomical coherence while using fewer parameters. The approach advances 3D/4D cardiac modeling from limited 2D cine-MRI data, with potential clinical impact in biomarker analysis and patient-specific simulations. Future work includes end-to-end integration with segmentation for fully automated reconstruction from 2D cine-MRI.

Abstract

We present the first geometric deep learning framework based on point cloud representation for 3D four-chamber cardiac reconstruction from cine MRI data. This work addresses a long-standing limitation in conventional cine MRI, which typically provides only 2D slice images of the heart, thereby restricting a comprehensive understanding of cardiac morphology and physiological mechanisms in both healthy and pathological conditions. To overcome this, we propose \textbf{HeartFormer}, a novel point cloud completion network that extends traditional single-class point cloud completion to the multi-class. HeartFormer consists of two key components: a Semantic-Aware Dual-Structure Transformer Network (SA-DSTNet) and a Semantic-Aware Geometry Feature Refinement Transformer Network (SA-GFRTNet). SA-DSTNet generates an initial coarse point cloud with both global geometry features and substructure geometry features. Guided by these semantic-geometry representations, SA-GFRTNet progressively refines the coarse output, effectively leveraging both global and substructure geometric priors to produce high-fidelity and geometrically consistent reconstructions. We further construct \textbf{HeartCompv1}, the first publicly available large-scale dataset with 17,000 high-resolution 3D multi-class cardiac meshes and point-clouds, to establish a general benchmark for this emerging research direction. Extensive cross-domain experiments on HeartCompv1 and UK Biobank demonstrate that HeartFormer achieves robust, accurate, and generalizable performance, consistently surpassing state-of-the-art (SOTA) methods. Code and dataset will be released upon acceptance at: https://github.com/10Darren/HeartFormer.

HeartFormer: Semantic-Aware Dual-Structure Transformers for 3D Four-Chamber Cardiac Point Cloud Reconstruction

TL;DR

This work introduces HeartFormer, the first geometric deep learning framework for multi-class 3D four-chamber cardiac reconstruction from cine MRI using point clouds. It combines a Semantic-Aware Dual-Structure Transformer (SA-DSTNet) for coarse semantic geometry with a two-stage Semantic-Aware Geometry-Feature Refinement Transformer (SA-GFRTNet) for progressive, anatomy-guided refinement, supervised by Semantic-Aware Chamfer Distance (SA-CD). A new large-scale HeartCompv1 dataset (17,000 samples) enables robust cross-domain evaluation alongside UK Biobank data, showing HeartFormer consistently outperforms state-of-the-art methods in geometric fidelity and anatomical coherence while using fewer parameters. The approach advances 3D/4D cardiac modeling from limited 2D cine-MRI data, with potential clinical impact in biomarker analysis and patient-specific simulations. Future work includes end-to-end integration with segmentation for fully automated reconstruction from 2D cine-MRI.

Abstract

We present the first geometric deep learning framework based on point cloud representation for 3D four-chamber cardiac reconstruction from cine MRI data. This work addresses a long-standing limitation in conventional cine MRI, which typically provides only 2D slice images of the heart, thereby restricting a comprehensive understanding of cardiac morphology and physiological mechanisms in both healthy and pathological conditions. To overcome this, we propose \textbf{HeartFormer}, a novel point cloud completion network that extends traditional single-class point cloud completion to the multi-class. HeartFormer consists of two key components: a Semantic-Aware Dual-Structure Transformer Network (SA-DSTNet) and a Semantic-Aware Geometry Feature Refinement Transformer Network (SA-GFRTNet). SA-DSTNet generates an initial coarse point cloud with both global geometry features and substructure geometry features. Guided by these semantic-geometry representations, SA-GFRTNet progressively refines the coarse output, effectively leveraging both global and substructure geometric priors to produce high-fidelity and geometrically consistent reconstructions. We further construct \textbf{HeartCompv1}, the first publicly available large-scale dataset with 17,000 high-resolution 3D multi-class cardiac meshes and point-clouds, to establish a general benchmark for this emerging research direction. Extensive cross-domain experiments on HeartCompv1 and UK Biobank demonstrate that HeartFormer achieves robust, accurate, and generalizable performance, consistently surpassing state-of-the-art (SOTA) methods. Code and dataset will be released upon acceptance at: https://github.com/10Darren/HeartFormer.

Paper Structure

This paper contains 76 sections, 23 equations, 13 figures, 11 tables.

Figures (13)

  • Figure 1: Two sparse point clouds with no and severe misalignment. Qualitative reconstruction results are shown in rows 1 and 3; advanced single-class methods (b,c) tend to generate the entire structure and fail to separate substructures. Rows 2 and 4 visualize the results with point colors indicating the Chamfer distances to the corresponding ground truth (GT) (f). Our proposed HeartFormer (e) achieves significantly better structural fidelity compared to SOTA methods (b-d).
  • Figure 2: Overview of the proposed HeartFormer architecture. HeartFormer first employs a SA-DSTNet, which integrates a Semantic-Aware Global Structure Aggregation (SA-GSA) module and a Semantic-Aware Substructure Aggregation (SA-SSA) to generate a coarse point cloud, $\mathbf{P}{\text{coarse}}$, by jointly modeling global context and substructure semantics. A subsequent two-stage SA-GFRTNet progressively enhances geometric detail and anatomical consistency, yielding the final high-fidelity reconstruction, $\mathbf{P}_{\text{fine}}$.
  • Figure 3: Comparison of reconstruction results under five different levels of misalignment in the HeartCompv1 dataset. Sparse Input denotes the partial point clouds with varying degrees of misalignment. PCN, FBNet, AdaPoinTr, ODGNet, PointAttN, and SymmCompletion are the single-class point cloud completion models, while PCCN and the proposed HeartFormer are multi-class completion models. GT indicates the ground-truth.
  • Figure 4: Comparison of reconstruction results under five different levels of misalignment in the HeartCompv1 dataset. The point colors visualize the Chamfer Distance between the reconstructed point cloud and the corresponding ground truth.
  • Figure 5: Comparison of reconstruction results for four subjects from the UK Biobank dataset. The reconstructions produced by HeartFormer more closely resemble the gold standard than those generated by PCCN.
  • ...and 8 more figures