Distributed Learning and Function Fusion in Reproducing Kernel Hilbert Space
Aneesh Raghavan, Karl Henrik Johansson
TL;DR
This work develops a forget-and-relearn distributed learning framework in Reproducing Kernel Hilbert Spaces for two agents and a fusion center. Each agent learns locally in its own RKHS with kernel $K^i$, uploads a function to a fusion center whose fused kernel is $K=K^1+K^2$, and then relearns the fused function via a regularized least-squares problem in the fused RKHS; the process uses uploading/downloading operators and a data-recovery fusion step to iteratively refine the global estimator. Theoretical results establish closed-form estimators at the agents, bounds and convergence properties of the learning operators to unity, and strong consistency with fixed-point guarantees; an example demonstrates the approach with heterogeneous features and a compact fused RKHS. The framework supports heterogeneous, privacy-preserving data fusion, and points to future directions in nonparametric extensions, transform-based fusion, and knowledge-transfer with reasoning integrations.
Abstract
We consider the problem of function estimation by a multi-agent system comprising of two agents and a fusion center. Each agent receives data comprising of samples of an independent variable (input) and the corresponding values of the dependent variable (output). The data remains local and is not shared with other members in the system. The objective of the system is to collaboratively estimate the function from the input to the output. To this end, we develop an iterative distributed algorithm for this function estimation problem. Each agent solves a local estimation problem in a Reproducing Kernel Hilbert Space (RKHS) and uploads the function to the fusion center. At the fusion center, the functions are fused by first estimating the data points that would have generated the uploaded functions and then subsequently solving a least squares estimation problem using the estimated data from both functions. The fused function is downloaded by the agents and is subsequently used for estimation at the next iteration along with incoming data. This procedure is executed sequentially and stopped when the difference between consecutively estimated functions becomes small enough. To analyze the algorithm, we define learning operators for the agents, fusion center and the system. We study the asymptotic properties of the norm of the learning operators and find sufficient conditions under which they converge to $1$. Given a sequence of data points, we define and prove the existence of the learning operator for the system. We prove that the porposed learning algorithm is consistent and demonstrate the same using an example. The paper has been submitted to L4DC 2024.
