Table of Contents
Fetching ...

Operator-Theoretic Framework for Gradient-Free Federated Learning

Mohit Kumar, Mathias Brucker, Alexander Valentinitsch, Adnan Husakovic, Ali Abbas, Manuela Geiß, Bernhard A. Moser

TL;DR

The paper introduces an operator-theoretic, gradient-free framework for gradient-free federated learning that handles data heterogeneity, privacy, and secure inference. By mapping the optimal $L^2$ solution into an RKHS via a forward operator and back via an inverse, it derives a data-driven, nonparametric learning solution with non-asymptotic risk and error bounds. It introduces a data-dependent hypothesis space through a generalized kernel tied to Kernel Affine Hull Machines (KAHMs) and a space folding mechanism, enabling low-communication aggregation and a posterior-probability interpretation. Privacy is achieved through a principled DP mechanism with kernel smoothing, while secure inference is facilitated by TFHE-based fully homomorphic encryption on scalar space-folding summaries, resulting in $Q imes C$ minimum and $C$ equality-comparison operations per test point. Across four benchmarks, the approach matches or outperforms gradient-based FL under non-IID and long-tailed distributions, and experimental results demonstrate DP robustness and practical FHE latencies, highlighting a principled, end-to-end alternative to gradient-based federated learning in heterogeneous settings.

Abstract

Federated learning must address heterogeneity, strict communication and computation limits, and privacy while ensuring performance. We propose an operator-theoretic framework that maps the $L^2$-optimal solution into a reproducing kernel Hilbert space (RKHS) via a forward operator, approximates it using available data, and maps back with the inverse operator, yielding a gradient-free scheme. Finite-sample bounds are derived using concentration inequalities over operator norms, and the framework identifies a data-dependent hypothesis space with guarantees on risk, error, robustness, and approximation. Within this space we design efficient kernel machines leveraging the space folding property of Kernel Affine Hull Machines. Clients transfer knowledge via a scalar space folding measure, reducing communication and enabling a simple differentially private protocol: summaries are computed from noise-perturbed data matrices in one step, avoiding per-round clipping and privacy accounting. The induced global rule requires only integer minimum and equality-comparison operations per test point, making it compatible with fully homomorphic encryption (FHE). Across four benchmarks, the gradient-free FL method with fixed encoder embeddings matches or outperforms strong gradient-based fine-tuning, with gains up to 23.7 points. In differentially private experiments, kernel smoothing mitigates accuracy loss in high-privacy regimes. The global rule admits an FHE realization using $Q \times C$ encrypted minimum and $C$ equality-comparison operations per test point, with operation-level benchmarks showing practical latencies. Overall, the framework provides provable guarantees with low communication, supports private knowledge transfer via scalar summaries, and yields an FHE-compatible prediction rule offering a mathematically grounded alternative to gradient-based federated learning under heterogeneity.

Operator-Theoretic Framework for Gradient-Free Federated Learning

TL;DR

The paper introduces an operator-theoretic, gradient-free framework for gradient-free federated learning that handles data heterogeneity, privacy, and secure inference. By mapping the optimal solution into an RKHS via a forward operator and back via an inverse, it derives a data-driven, nonparametric learning solution with non-asymptotic risk and error bounds. It introduces a data-dependent hypothesis space through a generalized kernel tied to Kernel Affine Hull Machines (KAHMs) and a space folding mechanism, enabling low-communication aggregation and a posterior-probability interpretation. Privacy is achieved through a principled DP mechanism with kernel smoothing, while secure inference is facilitated by TFHE-based fully homomorphic encryption on scalar space-folding summaries, resulting in minimum and equality-comparison operations per test point. Across four benchmarks, the approach matches or outperforms gradient-based FL under non-IID and long-tailed distributions, and experimental results demonstrate DP robustness and practical FHE latencies, highlighting a principled, end-to-end alternative to gradient-based federated learning in heterogeneous settings.

Abstract

Federated learning must address heterogeneity, strict communication and computation limits, and privacy while ensuring performance. We propose an operator-theoretic framework that maps the -optimal solution into a reproducing kernel Hilbert space (RKHS) via a forward operator, approximates it using available data, and maps back with the inverse operator, yielding a gradient-free scheme. Finite-sample bounds are derived using concentration inequalities over operator norms, and the framework identifies a data-dependent hypothesis space with guarantees on risk, error, robustness, and approximation. Within this space we design efficient kernel machines leveraging the space folding property of Kernel Affine Hull Machines. Clients transfer knowledge via a scalar space folding measure, reducing communication and enabling a simple differentially private protocol: summaries are computed from noise-perturbed data matrices in one step, avoiding per-round clipping and privacy accounting. The induced global rule requires only integer minimum and equality-comparison operations per test point, making it compatible with fully homomorphic encryption (FHE). Across four benchmarks, the gradient-free FL method with fixed encoder embeddings matches or outperforms strong gradient-based fine-tuning, with gains up to 23.7 points. In differentially private experiments, kernel smoothing mitigates accuracy loss in high-privacy regimes. The global rule admits an FHE realization using encrypted minimum and equality-comparison operations per test point, with operation-level benchmarks showing practical latencies. Overall, the framework provides provable guarantees with low communication, supports private knowledge transfer via scalar summaries, and yields an FHE-compatible prediction rule offering a mathematically grounded alternative to gradient-based federated learning under heterogeneity.

Paper Structure

This paper contains 93 sections, 12 theorems, 6 equations, 4 figures, 10 tables.

Key Result

Theorem 1

The following holds with probability at least $1-\delta$ for any $\delta \in (0,1)$:

Figures (4)

  • Figure 1: The operator-theoretic kernel FL framework is developed by 1) considering the optimal learning solution in $L^2(\mathbb{R}^n,\mathbb{P}_{x})$, 2) mapping the optimal solution onto a RKHS (associated to a generalized kernel) using an operator, 3) approximating the optimal solution using available data samples in RKHS, 4) mapping the sample approximated solution onto $L^2(\mathbb{R}^n,\mathbb{P}_{x})$ using the inverse-operator, 5) analyzing the sample approximated solution in $L^2(\mathbb{R}^n,\mathbb{P}_{x})$ and identifying conditions on kernel choice to define hypothesis space, 6) implementing a suitable hypothesis with the minimum computational and communication cost in the federated setting using kernel models.
  • Figure 2: An illustration of the space folding property possessed by a KAHM.
  • Figure 3: An example of the global space folding measure associated to the distributed data.
  • Figure 4: The proposed space folding measure-based methodology, referred to as SFM, estimates for a given input $x$ the probability of $c^{th}$ class, $\Phi^*_c(x)$, without imposing statistical assumptions on clients' data distributions, thereby ensuring robustness towards statistical heterogeneity.

Theorems & Definitions (42)

  • Remark 1: Data Heterogeneity across Clients
  • Remark 2: Rational for the Restrictive Feature-Map
  • Theorem 1: Risk for Sample Approximation of the Optimal Solution in RKHS
  • proof
  • Theorem 2: Risk for Generalized Learning Solution
  • proof
  • Theorem 3: Prediction Error Bound for Generalized Learning Solution
  • proof
  • Remark 3: Robustness of Generalized Learning Solution
  • Theorem 4: Approximation Error Bound for Generalized Learning Solution
  • ...and 32 more