Table of Contents
Fetching ...

From Foundation ECG Models to NISQ Learners: Distilling ECGFounder into a VQC Student

Giovanni dos Santos Franco, Felipe Mahlow, Ellison Fernando Cardoso, Felipe Fanchini

Abstract

Foundation models have recently improved electrocardiogram (ECG) representation learning, but their deployment can be limited by computational cost and latency constraints. In this work, we fine-tune ECGFounder as a high-capacity teacher for binary ECG classification on PTB-XL and the MIT-BIH Arrhythmia Database, and investigate whether knowledge distillation can transfer its predictive behavior to compact students. We evaluate two classical 1D students (ResNet-1D and a lightweight CNN-1D) and a quantum-ready pipeline that combines a convolutional autoencoder, which compresses 256-sample ECG windows into a low-dimensional latent representation, with a 6-qubit variational quantum circuit implemented in Qiskit and executed in a simulated backend. Across both datasets, the teacher provides the strongest overall performance, while distillation yields competitive students under a considerable reduction in trainable parameters. We further analyze the sensitivity of student performance to distillation settings, highlighting consistent accuracy--efficiency trade-offs when compressing a foundation ECG model into classical and quantum-ready learners under a unified evaluation protocol.

From Foundation ECG Models to NISQ Learners: Distilling ECGFounder into a VQC Student

Abstract

Foundation models have recently improved electrocardiogram (ECG) representation learning, but their deployment can be limited by computational cost and latency constraints. In this work, we fine-tune ECGFounder as a high-capacity teacher for binary ECG classification on PTB-XL and the MIT-BIH Arrhythmia Database, and investigate whether knowledge distillation can transfer its predictive behavior to compact students. We evaluate two classical 1D students (ResNet-1D and a lightweight CNN-1D) and a quantum-ready pipeline that combines a convolutional autoencoder, which compresses 256-sample ECG windows into a low-dimensional latent representation, with a 6-qubit variational quantum circuit implemented in Qiskit and executed in a simulated backend. Across both datasets, the teacher provides the strongest overall performance, while distillation yields competitive students under a considerable reduction in trainable parameters. We further analyze the sensitivity of student performance to distillation settings, highlighting consistent accuracy--efficiency trade-offs when compressing a foundation ECG model into classical and quantum-ready learners under a unified evaluation protocol.

Paper Structure

This paper contains 18 sections, 4 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of the proposed pipeline. Left: a foundation model (ECGFounder) is fine-tuned to obtain a task-specific teacher. Center: a distillation loss combines hard labels and teacher logits to produce gradients. Right: three students (CNN, ResNet, and a compact VQC-based student) are trained under the same supervision signal for a controlled accuracy--efficiency comparison.
  • Figure 2: Quantum-student circuit used in this work. A 6-qubit ZZFeatureMap (green) is followed by a hardware-efficient EfficientSU2 ansatz (blue). All qubits are measured to form the output vector used for binary prediction/distillation.
  • Figure 3: Model complexity comparison (log scale). ECGFounder is the high-capacity teacher, while ResNet, CNN, and Autoencoder+VQC are distilled students with substantially reduced parameter budgets.
  • Figure 4: Precision score as a function of the distillation hyperparameters $\alpha$ and temperature $T$ for PTB-XL and MIT-BIH, comparing VQC, CNN, and ResNet students. Solid lines denote PTB-XL and dashed lines denote MIT-BIH.