Table of Contents
Fetching ...

Scalable Asynchronous Federated Modeling for Spatial Data

Jianwei Shi, Sameh Abdulah, Ying Sun, Marc G. Genton

TL;DR

This work develops a scalable asynchronous federated modeling framework for spatial data using a knots-based low-rank Gaussian process. It introduces block-wise updates with local gradient correction, staleness-aware adaptive aggregation, and moving-average stabilization, and proves linear convergence with explicit dependence on staleness. Empirical results show that the low-rank model better preserves cross-worker spatial structure than an independence model, and that asynchronous updates outperform synchronous ones in heterogeneous settings while staying competitive when resources are balanced. The approach offers robust, privacy-preserving, and scalable spatial inference suitable for distributed environmental, urban, and public-health applications.

Abstract

Spatial data are central to applications such as environmental monitoring and urban planning, but are often distributed across devices where privacy and communication constraints limit direct sharing. Federated modeling offers a practical solution that preserves data privacy while enabling global modeling across distributed data sources. For instance, environmental sensor networks are privacy- and bandwidth-constrained, motivating federated spatial modeling that shares only privacy-preserving summaries to produce timely, high-resolution pollution maps without centralizing raw data. However, existing federated modeling approaches either ignore spatial dependence or rely on synchronous updates that suffer from stragglers in heterogeneous environments. This work proposes an asynchronous federated modeling framework for spatial data based on low-rank Gaussian process approximations. The method employs block-wise optimization and introduces strategies for gradient correction, adaptive aggregation, and stabilized updates. We establish linear convergence with explicit dependence on staleness, a result of standalone theoretical significance. Moreover, numerical experiments demonstrate that the asynchronous algorithm achieves synchronous performance under balanced resource allocation and significantly outperforms it in heterogeneous settings, showcasing superior robustness and scalability.

Scalable Asynchronous Federated Modeling for Spatial Data

TL;DR

This work develops a scalable asynchronous federated modeling framework for spatial data using a knots-based low-rank Gaussian process. It introduces block-wise updates with local gradient correction, staleness-aware adaptive aggregation, and moving-average stabilization, and proves linear convergence with explicit dependence on staleness. Empirical results show that the low-rank model better preserves cross-worker spatial structure than an independence model, and that asynchronous updates outperform synchronous ones in heterogeneous settings while staying competitive when resources are balanced. The approach offers robust, privacy-preserving, and scalable spatial inference suitable for distributed environmental, urban, and public-health applications.

Abstract

Spatial data are central to applications such as environmental monitoring and urban planning, but are often distributed across devices where privacy and communication constraints limit direct sharing. Federated modeling offers a practical solution that preserves data privacy while enabling global modeling across distributed data sources. For instance, environmental sensor networks are privacy- and bandwidth-constrained, motivating federated spatial modeling that shares only privacy-preserving summaries to produce timely, high-resolution pollution maps without centralizing raw data. However, existing federated modeling approaches either ignore spatial dependence or rely on synchronous updates that suffer from stragglers in heterogeneous environments. This work proposes an asynchronous federated modeling framework for spatial data based on low-rank Gaussian process approximations. The method employs block-wise optimization and introduces strategies for gradient correction, adaptive aggregation, and stabilized updates. We establish linear convergence with explicit dependence on staleness, a result of standalone theoretical significance. Moreover, numerical experiments demonstrate that the asynchronous algorithm achieves synchronous performance under balanced resource allocation and significantly outperforms it in heterogeneous settings, showcasing superior robustness and scalability.

Paper Structure

This paper contains 21 sections, 15 theorems, 153 equations, 16 figures, 3 algorithms.

Key Result

Proposition 4

Let $P$, $P_1$, and $P_2$ denote the distributions of the latent variables under the original Gaussian process eq:spatial_data, the low-rank model with full local covariance eq:full_local_covariance, and the independent model eq:ind_log_likelihood, respectively. Then, the dimension-normalized KL div where $\mathrm{KL}_N(P \,\|\, Q) := \frac{1}{N} \mathrm{KL}(P \,\|\, Q)$ is the KL divergence scale

Figures (16)

  • Figure 1: Boxplots of parameter estimates under varying covariance settings: $\sigma$ (left) and $\beta$ (right).
  • Figure 2: Boxplots of parameter estimates under varying sample sizes: $\sigma$ (left) and $\beta$ (right).
  • Figure 3: Boxplots of parameter estimates under varying number of workers: $\sigma$ (top) and $\beta$ (bottom).
  • Figure 4: Boxplots of KL divergence of independent model and low-rank model with varying knot numbers.
  • Figure 5: Performance comparison of asynchronous strategies under the core assignment.
  • ...and 11 more figures

Theorems & Definitions (25)

  • Remark 1: Knot selection
  • Remark 2: Low-Rank Models
  • Remark 3: Covariance structure for the residual
  • Proposition 4
  • Remark 5: Normalization of KL Divergence with Dimension
  • Remark 6: KL Divergence and Likelihood Connection
  • Remark 7
  • Remark 8: Hyperparameter tuning
  • Example 1
  • Remark 9
  • ...and 15 more