PULSE: A Unified Multi-Task Architecture for Cardiac Segmentation, Diagnosis, and Few-Shot Cross-Modality Clinical Adaptation
Hania Ghouse, Maryam Alsharqi, Farhad R. Nezami, Muzammil Behzad
TL;DR
PULSE introduces a unified transformer-based framework that simultaneously performs ventricular segmentation, cardiomyopathy classification, and clinically grounded output generation across MRI and ultrasound modalities. It leverages a self-supervised DINOv2 backbone and a 4-scale pyramid decoder to learn robust, cross-domain cardiac priors, trained with a composite loss that couples segmentation and diagnosis. The approach achieves strong segmentation and classification on ACDC, generalizes to unseen MRI cohorts, and adapts to ultrasound with few-shot fine-tuning, demonstrating a foundation-style, cross-modality cardiac analysis pipeline. This work advances practical deployment by reducing annotation needs and enabling end-to-end clinical reasoning from pixels to narratives and indices.
Abstract
Cardiac image analysis remains fragmented across tasks: anatomical segmentation, disease classification, and grounded clinical report generation are typically handled by separate networks trained under different data regimes. No existing framework unifies these objectives within a single architecture while retaining generalization across imaging modalities and datasets. We introduce PULSE, a multi-task vision-language framework built on self-supervised representations and optimized through a composite supervision strategy that balances region overlap learning, pixel wise classification fidelity, and boundary aware IoU refinement. A multi-scale token reconstruction decoder enables anatomical segmentation, while shared global representations support disease classification and clinically grounded text output allowing the model to transition from pixels to structures and finally clinical reasoning within one architecture. Unlike prior task-specific pipelines, PULSE learns task-invariant cardiac priors, generalizes robustly across datasets, and can be adapted to new imaging modalities with minimal supervision. This moves the field closer to a scalable, foundation style cardiac analysis framework.
