Interpretable Data Fusion for Distributed Learning: A Representative Approach via Gradient Matching

Mengchen Fan; Baocheng Geng; Keren Li; Xueqian Wang; Pramod K. Varshney

Interpretable Data Fusion for Distributed Learning: A Representative Approach via Gradient Matching

Mengchen Fan, Baocheng Geng, Keren Li, Xueqian Wang, Pramod K. Varshney

TL;DR

The paper addresses the challenge of interpretable, privacy-preserving distributed learning by proposing a gradient-matching, representative-based framework that condenses batches into a single interpretable point and uses a residual term to align gradients with full data updates. It presents two components: a representative-based centralized learner and a distributed protocol in which clients generate local representatives and the server aggregates them with residual corrections, enabling real-time training with reduced communication. Empirical results on non-IID MNIST and multivariate datasets show that the approach can match or exceed FedAVG in accuracy and convergence, with greater gains for more complex models (e.g., CNNs) and larger client counts; residuals further stabilize and improve performance. The work advances human-in-the-loop AI by preserving data structure in representations, improving interpretability and oversight, and offering a practical path toward more transparent, collaborative distributed learning systems.

Abstract

This paper introduces a representative-based approach for distributed learning that transforms multiple raw data points into a virtual representation. Unlike traditional distributed learning methods such as Federated Learning, which do not offer human interpretability, our method makes complex machine learning processes accessible and comprehensible. It achieves this by condensing extensive datasets into digestible formats, thus fostering intuitive human-machine interactions. Additionally, this approach maintains privacy and communication efficiency, and it matches the training performance of models using raw data. Simulation results show that our approach is competitive with or outperforms traditional Federated Learning in accuracy and convergence, especially in scenarios with complex models and a higher number of clients. This framework marks a step forward in integrating human intuition with machine intelligence, which potentially enhances human-machine learning interfaces and collaborative efforts.

Interpretable Data Fusion for Distributed Learning: A Representative Approach via Gradient Matching

TL;DR

Abstract

Paper Structure (13 sections, 5 equations, 4 figures, 6 tables, 2 algorithms)

This paper contains 13 sections, 5 equations, 4 figures, 6 tables, 2 algorithms.

Introduction
Problem Formulation
Methodology
Representative-based Centralized Learning
Representative Data Fusion for Distributed Learning
Simulation Results
Experimental Setup
Multivariate Data Experiments
MNIST Data for Centralized Training
MNIST Data for Distributed Training
Improvements Achieved by Incorporating the Residual
Disscusion
Conclusions and Future Work

Figures (4)

Figure 1: Convergence of representative-based distributed learning for MINST dataset.
Figure 2: Comparison of convergence between representative distributed learning and FedAVG for MNIST dataset.
Figure 3: Convergence of representative-based distributed learning for MINST dataset with and without residual.
Figure 4: Representatives constructed from raw images with MLP and CNN models.

Interpretable Data Fusion for Distributed Learning: A Representative Approach via Gradient Matching

TL;DR

Abstract

Interpretable Data Fusion for Distributed Learning: A Representative Approach via Gradient Matching

Authors

TL;DR

Abstract

Table of Contents

Figures (4)