Table of Contents
Fetching ...

DF-DM: A foundational process model for multimodal data fusion in the artificial intelligence era

David Restrepo, Chenwei Wu, Constanza Vásquez-Venegas, Luis Filipe Nakayama, Leo Anthony Celi, Diego M López

TL;DR

This work introduces Data Fusion for Data Mining (DF-DM), a foundational process model that extends the DFGI framework by integrating the CRISP-DM process and leveraging foundation-model embeddings for efficient multimodal data fusion in AI-era data mining. It proposes a Disentangled Dense Fusion mechanism to separate modality-common and modality-specific information, optimizing mutual information while enabling dense inter-modal interactions. Validation across three healthcare scenarios—diabetic retinopathy, domestic violence prediction from open data, and MIMIC-CXR analysis—demonstrates strong predictive performance and bias-awareness capabilities, with metrics such as Macro F1 around the mid-0.9s for DR, high macro AUCs for sex and disease tasks, and notable R^2 improvements for time-series DV prediction. The framework emphasizes efficiency, flexibility, and fairness, arguing for broader applicability in resource-constrained settings and complex, heterogeneous data contexts, while acknowledging limitations related to foundation-model availability and bias mitigation needs.

Abstract

In the big data era, integrating diverse data modalities poses significant challenges, particularly in complex fields like healthcare. This paper introduces a new process model for multimodal Data Fusion for Data Mining, integrating embeddings and the Cross-Industry Standard Process for Data Mining with the existing Data Fusion Information Group model. Our model aims to decrease computational costs, complexity, and bias while improving efficiency and reliability. We also propose "disentangled dense fusion", a novel embedding fusion method designed to optimize mutual information and facilitate dense inter-modality feature interaction, thereby minimizing redundant information. We demonstrate the model's efficacy through three use cases: predicting diabetic retinopathy using retinal images and patient metadata, domestic violence prediction employing satellite imagery, internet, and census data, and identifying clinical and demographic features from radiography images and clinical notes. The model achieved a Macro F1 score of 0.92 in diabetic retinopathy prediction, an R-squared of 0.854 and sMAPE of 24.868 in domestic violence prediction, and a macro AUC of 0.92 and 0.99 for disease prediction and sex classification, respectively, in radiological analysis. These results underscore the Data Fusion for Data Mining model's potential to significantly impact multimodal data processing, promoting its adoption in diverse, resource-constrained settings.

DF-DM: A foundational process model for multimodal data fusion in the artificial intelligence era

TL;DR

This work introduces Data Fusion for Data Mining (DF-DM), a foundational process model that extends the DFGI framework by integrating the CRISP-DM process and leveraging foundation-model embeddings for efficient multimodal data fusion in AI-era data mining. It proposes a Disentangled Dense Fusion mechanism to separate modality-common and modality-specific information, optimizing mutual information while enabling dense inter-modal interactions. Validation across three healthcare scenarios—diabetic retinopathy, domestic violence prediction from open data, and MIMIC-CXR analysis—demonstrates strong predictive performance and bias-awareness capabilities, with metrics such as Macro F1 around the mid-0.9s for DR, high macro AUCs for sex and disease tasks, and notable R^2 improvements for time-series DV prediction. The framework emphasizes efficiency, flexibility, and fairness, arguing for broader applicability in resource-constrained settings and complex, heterogeneous data contexts, while acknowledging limitations related to foundation-model availability and bias mitigation needs.

Abstract

In the big data era, integrating diverse data modalities poses significant challenges, particularly in complex fields like healthcare. This paper introduces a new process model for multimodal Data Fusion for Data Mining, integrating embeddings and the Cross-Industry Standard Process for Data Mining with the existing Data Fusion Information Group model. Our model aims to decrease computational costs, complexity, and bias while improving efficiency and reliability. We also propose "disentangled dense fusion", a novel embedding fusion method designed to optimize mutual information and facilitate dense inter-modality feature interaction, thereby minimizing redundant information. We demonstrate the model's efficacy through three use cases: predicting diabetic retinopathy using retinal images and patient metadata, domestic violence prediction employing satellite imagery, internet, and census data, and identifying clinical and demographic features from radiography images and clinical notes. The model achieved a Macro F1 score of 0.92 in diabetic retinopathy prediction, an R-squared of 0.854 and sMAPE of 24.868 in domestic violence prediction, and a macro AUC of 0.92 and 0.99 for disease prediction and sex classification, respectively, in radiological analysis. These results underscore the Data Fusion for Data Mining model's potential to significantly impact multimodal data processing, promoting its adoption in diverse, resource-constrained settings.
Paper Structure (49 sections, 11 equations, 6 figures, 5 tables)

This paper contains 49 sections, 11 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: DFGI Data Fusion model proposed, including AI and ML from 41. The Levels where AI and ML techniques are proposed can be seen in red. The original DFGI model can be seen in blue.
  • Figure 2: The proposed Data Fusion for Data Mining Model (DF-DM). The model is based on the DFGI model, integrating AI and ML but adding other functionalities vital for data mining tasks in orange.
  • Figure 3: Illustrative Framework for Utilizing foundation models in Various Tasks. The figure assumes an initial foundation model for a general task and 3 different options. Option 1 is Zero-shot learning using the foundation model directly for a downstream task. Option 2 suggests the use of embedding, where embeddings of the original data are extracted and used for downstream task training. Option 3 means fine-tuning the full model for a specific task. The resulting model can also be used for embedding extraction.
  • Figure 4: Disentangled dense data fusion model for classification tasks.
  • Figure 5: Satellite image embedding extraction approach using a variational autoencoder with a Resnet 50 V2 backbone as encoder.
  • ...and 1 more figures