Table of Contents
Fetching ...

Measuring Heterogeneity in Machine Learning with Distributed Energy Distance

Mengchen Fan, Baocheng Geng, Roman Shterenberg, Joseph A. Casey, Zhong Chen, Keren Li

TL;DR

This work tackles heterogeneity in distributed and federated learning by adopting the energy distance as a principled, sensitive measure of feature distribution differences across nodes. It defines the Energy Distance and the Energy Coefficient $H$ and develops Taylor-based approximations to enable scalable computation in large-scale systems, along with corrections for skewness and kurtosis. Empirical results on simulated distributions and MNIST-based federated settings demonstrate that the Taylor approximations closely track exact calculations while substantially reducing runtime, and show how feature heterogeneity quantified by $H$ relates to learning performance. The authors also propose using $H$ as a penalty weight to align predictions across heterogeneous nodes, offering a practical mechanism to improve coordination and convergence in non-IID distributed environments.

Abstract

In distributed and federated learning, heterogeneity across data sources remains a major obstacle to effective model aggregation and convergence. We focus on feature heterogeneity and introduce energy distance as a sensitive measure for quantifying distributional discrepancies. While we show that energy distance is robust for detecting data distribution shifts, its direct use in large-scale systems can be prohibitively expensive. To address this, we develop Taylor approximations that preserve key theoretical quantitative properties while reducing computational overhead. Through simulation studies, we show how accurately capturing feature discrepancies boosts convergence in distributed learning. Finally, we propose a novel application of energy distance to assign penalty weights for aligning predictions across heterogeneous nodes, ultimately enhancing coordination in federated and distributed settings.

Measuring Heterogeneity in Machine Learning with Distributed Energy Distance

TL;DR

This work tackles heterogeneity in distributed and federated learning by adopting the energy distance as a principled, sensitive measure of feature distribution differences across nodes. It defines the Energy Distance and the Energy Coefficient and develops Taylor-based approximations to enable scalable computation in large-scale systems, along with corrections for skewness and kurtosis. Empirical results on simulated distributions and MNIST-based federated settings demonstrate that the Taylor approximations closely track exact calculations while substantially reducing runtime, and show how feature heterogeneity quantified by relates to learning performance. The authors also propose using as a penalty weight to align predictions across heterogeneous nodes, offering a practical mechanism to improve coordination and convergence in non-IID distributed environments.

Abstract

In distributed and federated learning, heterogeneity across data sources remains a major obstacle to effective model aggregation and convergence. We focus on feature heterogeneity and introduce energy distance as a sensitive measure for quantifying distributional discrepancies. While we show that energy distance is robust for detecting data distribution shifts, its direct use in large-scale systems can be prohibitively expensive. To address this, we develop Taylor approximations that preserve key theoretical quantitative properties while reducing computational overhead. Through simulation studies, we show how accurately capturing feature discrepancies boosts convergence in distributed learning. Finally, we propose a novel application of energy distance to assign penalty weights for aligning predictions across heterogeneous nodes, ultimately enhancing coordination in federated and distributed settings.

Paper Structure

This paper contains 11 sections, 17 equations, 5 figures.

Figures (5)

  • Figure 1: Computational time.
  • Figure 2: Energy coefficient $H$ for various distributions.
  • Figure 3: Mixed MNIST Inputs
  • Figure 4: Different Feature Distribution MNIST Input
  • Figure 5: Test accuracy of MINST dataset for Federated Learning with different feature distribution.