Table of Contents
Fetching ...

FDRMFL:Multi-modal Federated Feature Extraction Model Based on Information Maximization and Contrastive Learning

Haozhe Wu

TL;DR

The paper tackles the problem of extracting predictive, low-dimensional features from multi-modal data in a federated, non-IID setting while addressing catastrophic forgetting. It introduces FDRMFL, a task-driven framework that combines information maximization, cross-modal alignment, and contrastive learning within a FedAvg training regime. Empirical results from synthetic simulations and real near-infrared spectroscopy datasets (Tecator and corn) show that FDRMFL consistently outperforms traditional linear reduction baselines and VAEs in both accuracy and stability, even under noise and cross-client heterogeneity. The work demonstrates practical potential for privacy-preserving, multi-modal regression in industrial contexts like food quality and agricultural analytics.

Abstract

This study focuses on the feature extraction problem in multi-modal data regression. To address three core challenges in real-world scenarios: limited and non-IID data, effective extraction and fusion of multi-modal information, and susceptibility to catastrophic forgetting in model learning, a task-driven supervised multi-modal federated feature extraction method is proposed. The method integrates multi-modal information extraction and contrastive learning mechanisms, and can adapt to different neural network structures as the latent mapping functions for data of each modality. It supports each client to independently learn low-dimensional representations of multi-modal data, and can flexibly control the degree of retention of effective information about the response variable in the predictive variables within the low-dimensional features through parameter tuning. The multi-constraint learning framework constructed by the method guarantees regression accuracy using Mean Squared Error loss. Through the synergistic effect of mutual information preservation constraint, symmetric Kullback-Leibler divergence constraint, and inter-model contrastive constraint, it achieves the retention of task-related information, the extraction, fusion, and alignment of multi-modal features, and the mitigation of representation drift and catastrophic forgetting in non-IID scenarios, respectively. This ensures that the feature extraction process always centers on improving the performance of downstream regression tasks. Experimental results from simulations and real-world data analysis demonstrate that the proposed method achieves more significant performance improvement on downstream regression tasks compared with classical feature extraction techniques.

FDRMFL:Multi-modal Federated Feature Extraction Model Based on Information Maximization and Contrastive Learning

TL;DR

The paper tackles the problem of extracting predictive, low-dimensional features from multi-modal data in a federated, non-IID setting while addressing catastrophic forgetting. It introduces FDRMFL, a task-driven framework that combines information maximization, cross-modal alignment, and contrastive learning within a FedAvg training regime. Empirical results from synthetic simulations and real near-infrared spectroscopy datasets (Tecator and corn) show that FDRMFL consistently outperforms traditional linear reduction baselines and VAEs in both accuracy and stability, even under noise and cross-client heterogeneity. The work demonstrates practical potential for privacy-preserving, multi-modal regression in industrial contexts like food quality and agricultural analytics.

Abstract

This study focuses on the feature extraction problem in multi-modal data regression. To address three core challenges in real-world scenarios: limited and non-IID data, effective extraction and fusion of multi-modal information, and susceptibility to catastrophic forgetting in model learning, a task-driven supervised multi-modal federated feature extraction method is proposed. The method integrates multi-modal information extraction and contrastive learning mechanisms, and can adapt to different neural network structures as the latent mapping functions for data of each modality. It supports each client to independently learn low-dimensional representations of multi-modal data, and can flexibly control the degree of retention of effective information about the response variable in the predictive variables within the low-dimensional features through parameter tuning. The multi-constraint learning framework constructed by the method guarantees regression accuracy using Mean Squared Error loss. Through the synergistic effect of mutual information preservation constraint, symmetric Kullback-Leibler divergence constraint, and inter-model contrastive constraint, it achieves the retention of task-related information, the extraction, fusion, and alignment of multi-modal features, and the mitigation of representation drift and catastrophic forgetting in non-IID scenarios, respectively. This ensures that the feature extraction process always centers on improving the performance of downstream regression tasks. Experimental results from simulations and real-world data analysis demonstrate that the proposed method achieves more significant performance improvement on downstream regression tasks compared with classical feature extraction techniques.

Paper Structure

This paper contains 9 sections, 62 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overall architecture for federated multi-modal learning.
  • Figure 2: FDRMFL Multi-Modal Federated Feature Extraction and Evaluation Pipeline
  • Figure 3: Visualization of the MSE comparison of methods under Link Function scenarios
  • Figure 4: Visualization of the MSE comparison between VAE and FDRMFL under different link functions
  • Figure 5: Visualization of the comparison of prediction MSE by various methods on the Tecator dataset
  • ...and 1 more figures