Table of Contents
Fetching ...

HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling

Daniel Duenias, Brennan Nichyporuk, Tal Arbel, Tammy Riklin Raviv

TL;DR

HyperFusion introduces a hypernetwork-based fusion framework that conditions MRI analysis on tabular EHR data to enhance predictive accuracy in multimodal brain tasks. By embedding tabular attributes and generating external parameters for selected layers of a CNN/ResNet backbone, the method enables dynamic, input-specific adjustments during both training and inference. The approach is validated on two brain MRI tasks—brain age prediction conditioned by sex and multiclass AD classification—where HyperFusion consistently outperforms single-modality models and existing MRI-tabular fusion methods, supported by ablations and robust cross-validation. The work demonstrates the practicality and flexibility of hypernetworks for integrating heterogeneous clinical data, with potential to extend to broader multimodal medical decision-making scenarios.

Abstract

The integration of diverse clinical modalities such as medical imaging and the tabular data extracted from patients' Electronic Health Records (EHRs) is a crucial aspect of modern healthcare. Integrative analysis of multiple sources can provide a comprehensive understanding of the clinical condition of a patient, improving diagnosis and treatment decision. Deep Neural Networks (DNNs) consistently demonstrate outstanding performance in a wide range of multimodal tasks in the medical domain. However, the complex endeavor of effectively merging medical imaging with clinical, demographic and genetic information represented as numerical tabular data remains a highly active and ongoing research pursuit. We present a novel framework based on hypernetworks to fuse clinical imaging and tabular data by conditioning the image processing on the EHR's values and measurements. This approach aims to leverage the complementary information present in these modalities to enhance the accuracy of various medical applications. We demonstrate the strength and generality of our method on two different brain Magnetic Resonance Imaging (MRI) analysis tasks, namely, brain age prediction conditioned by subject's sex and multi-class Alzheimer's Disease (AD) classification conditioned by tabular data. We show that our framework outperforms both single-modality models and state-of-the-art MRI tabular data fusion methods. A link to our code can be found at https://github.com/daniel4725/HyperFusion

HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling

TL;DR

HyperFusion introduces a hypernetwork-based fusion framework that conditions MRI analysis on tabular EHR data to enhance predictive accuracy in multimodal brain tasks. By embedding tabular attributes and generating external parameters for selected layers of a CNN/ResNet backbone, the method enables dynamic, input-specific adjustments during both training and inference. The approach is validated on two brain MRI tasks—brain age prediction conditioned by sex and multiclass AD classification—where HyperFusion consistently outperforms single-modality models and existing MRI-tabular fusion methods, supported by ablations and robust cross-validation. The work demonstrates the practicality and flexibility of hypernetworks for integrating heterogeneous clinical data, with potential to extend to broader multimodal medical decision-making scenarios.

Abstract

The integration of diverse clinical modalities such as medical imaging and the tabular data extracted from patients' Electronic Health Records (EHRs) is a crucial aspect of modern healthcare. Integrative analysis of multiple sources can provide a comprehensive understanding of the clinical condition of a patient, improving diagnosis and treatment decision. Deep Neural Networks (DNNs) consistently demonstrate outstanding performance in a wide range of multimodal tasks in the medical domain. However, the complex endeavor of effectively merging medical imaging with clinical, demographic and genetic information represented as numerical tabular data remains a highly active and ongoing research pursuit. We present a novel framework based on hypernetworks to fuse clinical imaging and tabular data by conditioning the image processing on the EHR's values and measurements. This approach aims to leverage the complementary information present in these modalities to enhance the accuracy of various medical applications. We demonstrate the strength and generality of our method on two different brain Magnetic Resonance Imaging (MRI) analysis tasks, namely, brain age prediction conditioned by subject's sex and multi-class Alzheimer's Disease (AD) classification conditioned by tabular data. We show that our framework outperforms both single-modality models and state-of-the-art MRI tabular data fusion methods. A link to our code can be found at https://github.com/daniel4725/HyperFusion
Paper Structure (47 sections, 10 equations, 11 figures, 8 tables)

This paper contains 47 sections, 10 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: An illustration of the proposed HyperFusion's Framework. The two main components - hypernetwork and primary network are shown in the upper and the lower part of the figure, respectively. A: The inputs $T$ and $I$ denote tabular and imaging data, respectively. B: The hypernetwork $\mathcal{H}_\phi$ is composed of $K$ individual networks, $\{h_k\}_{k=1, \ldots, K}$, which generate parameters $h_k(T)=\theta_{h_k}$ for specific (external) layers of the primary network $\mathcal{P}_\theta$ (red arrows). C: The primary network is composed of internal layers which are updated throughout the backpropagation process (yellow arrows) and external layers (marked in red).
  • Figure 2: HyperFusion architecture for conditioned brain age prediction. A: The inputs include the subjects' sex (encoded as a 2D one-hot vector) and the corresponding 3D brain MRIs. B: The primary network backbone is a variant of the VGG architecture simonyan2014VGG, where the parameters of its final four linear layers (framed in red) are external and generated by the hypernetwork. C: A closer look of a convolutional block . D: The hypernetwork comprises four sub-networks ($h_1, h_2, h_3, h_4$), each corresponds to one of the linear layer in the primary network.
  • Figure 3: HyperFusion architecture for the AD classification. A: The input consists of tabular attributes ($d$ in total) of the subjects along with their brain MRIs. B: The primary network's is composed of pre-activation ResNet blocks followed by two linear layers. The last ResNet block (framed in red) gets a subset of its parameters from the hypernetwork. The primary network predictions are probability distributions ($P_{CN}, P_{MCI}, P_{AD}$) produced by the softmax layer. The loss is a weighted sum of the classification loss (weighted CE) and weight decay regularization, as detailed in Equations \ref{['eq:loss_func']},\ref{['eq:loss_weighted_CE']}. C: The hypernetwork architecture D: A closer look at the pre-activation Res Block. See text for details.
  • Figure 4: Conditioned brain age prediction results. A) An imaging only experiment. The MAE scores obtained for baseline networks trained on either male only (blue), female only (pink) and mixed (green) equally sized training sets, where the test sets are composed of male only (left), female only (middle) or mixed (right) subsets. B) HyperFusion: A comparison between the baseline model using imaging data alone (orange), concatenation based fusion (light blue), Hyperfusion with different embedding - inputs 2d vector and outhputs 2d vector (olive green), and the proposed HyperFusion model (purple) using MAE metric. The inference was performed using either male test data (left), female test data (middle) or the entire (mixed) test dataset (right).
  • Figure 5: Bar plot presentation of the AD classification results for six competing models and ours using seven different metrics (Section \ref{['sec:exp_AD_trainingNevaluation']}). The compared methods including 'DAFT-like$^{*}$' and 'FiLM-like$^{*}$' are described in Section \ref{['sec:AD_ablation']}.
  • ...and 6 more figures