Table of Contents
Fetching ...

Bayes-Split-Edge: Bayesian Optimization for Constrained Collaborative Inference in Wireless Edge Systems

Fatemeh Zahra Safaeipour, Jacob Chakareski, Morteza Hashemi

TL;DR

The paper tackles energy- and latency-constrained collaborative inference in wireless edge networks by proposing Bayes-Split-Edge, a constraint-aware Bayesian optimization framework that jointly selects the neural network split point and transmit power. By relaxing the discrete split index to a continuous variable and employing a hybrid acquisition function, the method achieves fast, sample-efficient convergence while strictly respecting energy and delay budgets. Theoretical regret bounds are established, and extensive experiments on realistic edge setups demonstrate near-optimal performance with orders of magnitude fewer evaluations than exhaustive search and clear superiority over baselines such as CMA-ES, DIRECT, and PPO-based RL. This approach enables practical real-time adaptive offloading and split decisions across dynamic wireless channels, with potential impact on XR and other latency-critical edge applications.

Abstract

Mobile edge devices (e.g., AR/VR headsets) typically need to complete timely inference tasks while operating with limited on-board computing and energy resources. In this paper, we investigate the problem of collaborative inference in wireless edge networks, where energy-constrained edge devices aim to complete inference tasks within given deadlines. These tasks are carried out using neural networks, and the edge device seeks to optimize inference performance under energy and delay constraints. The inference process can be split between the edge device and an edge server, thereby achieving collaborative inference over wireless networks. We formulate an inference utility optimization problem subject to energy and delay constraints, and propose a novel solution called Bayes-Split-Edge, which leverages Bayesian optimization for collaborative split inference over wireless edge networks. Our solution jointly optimizes the transmission power and the neural network split point. The Bayes-Split-Edge framework incorporates a novel hybrid acquisition function that balances inference task utility, sample efficiency, and constraint violation penalties. We evaluate our approach using the VGG19 model on the ImageNet-Mini dataset, and Resnet101 on Tiny-ImageNet, and real-world mMobile wireless channel datasets. Numerical results demonstrate that Bayes-Split-Edge achieves up to 2.4x reduction in evaluation cost compared to standard Bayesian optimization and achieves near-linear convergence. It also outperforms several baselines, including CMA-ES, DIRECT, exhaustive search, and Proximal Policy Optimization (PPO), while matching exhaustive search performance under tight constraints. These results confirm that the proposed framework provides a sample-efficient solution requiring maximum 20 function evaluations and constraint-aware optimization for wireless split inference in edge computing systems.

Bayes-Split-Edge: Bayesian Optimization for Constrained Collaborative Inference in Wireless Edge Systems

TL;DR

The paper tackles energy- and latency-constrained collaborative inference in wireless edge networks by proposing Bayes-Split-Edge, a constraint-aware Bayesian optimization framework that jointly selects the neural network split point and transmit power. By relaxing the discrete split index to a continuous variable and employing a hybrid acquisition function, the method achieves fast, sample-efficient convergence while strictly respecting energy and delay budgets. Theoretical regret bounds are established, and extensive experiments on realistic edge setups demonstrate near-optimal performance with orders of magnitude fewer evaluations than exhaustive search and clear superiority over baselines such as CMA-ES, DIRECT, and PPO-based RL. This approach enables practical real-time adaptive offloading and split decisions across dynamic wireless channels, with potential impact on XR and other latency-critical edge applications.

Abstract

Mobile edge devices (e.g., AR/VR headsets) typically need to complete timely inference tasks while operating with limited on-board computing and energy resources. In this paper, we investigate the problem of collaborative inference in wireless edge networks, where energy-constrained edge devices aim to complete inference tasks within given deadlines. These tasks are carried out using neural networks, and the edge device seeks to optimize inference performance under energy and delay constraints. The inference process can be split between the edge device and an edge server, thereby achieving collaborative inference over wireless networks. We formulate an inference utility optimization problem subject to energy and delay constraints, and propose a novel solution called Bayes-Split-Edge, which leverages Bayesian optimization for collaborative split inference over wireless edge networks. Our solution jointly optimizes the transmission power and the neural network split point. The Bayes-Split-Edge framework incorporates a novel hybrid acquisition function that balances inference task utility, sample efficiency, and constraint violation penalties. We evaluate our approach using the VGG19 model on the ImageNet-Mini dataset, and Resnet101 on Tiny-ImageNet, and real-world mMobile wireless channel datasets. Numerical results demonstrate that Bayes-Split-Edge achieves up to 2.4x reduction in evaluation cost compared to standard Bayesian optimization and achieves near-linear convergence. It also outperforms several baselines, including CMA-ES, DIRECT, exhaustive search, and Proximal Policy Optimization (PPO), while matching exhaustive search performance under tight constraints. These results confirm that the proposed framework provides a sample-efficient solution requiring maximum 20 function evaluations and constraint-aware optimization for wireless split inference in edge computing systems.

Paper Structure

This paper contains 17 sections, 1 theorem, 17 equations, 10 figures, 1 table, 1 algorithm.

Key Result

theorem 1

Assume the objective lies in the RKHS of a Matérn kernel, constraints are Lipschitz continuous, and the optimum is well-separated from the boundary. Then the cumulative regret of our method satisfies: where $\gamma_T^{(\delta)}$ is the information gain over the feasible region.

Figures (10)

  • Figure 1: System overview of wireless split learning. The edge device (e.g., AR headset, mobile phone, or wearable) performs initial neural network layers locally, while the remaining layers are offloaded to the edge server. Intermediate features are transmitted over a wireless channel. Feedback on network conditions is used to adapt the split layer dynamically to optimize performance under resource constraints.
  • Figure 2: Transmission delay across different split layers under varying channel conditions. The red error bars indicate the mean and range (max–min) of delay measurements over multiple frames. Background color represents the corresponding channel gain (in dB).
  • Figure 3: End-to-end delay breakdown for different split layers in the collaborative inference pipeline. Blue and red bars represent computation delay at the edge device and edge server, respectively, while green error bars indicate the mean and range of transmission delay across multiple channel realizations. We assume negligible server-side transmission delay since the downstream payload (logits/labels) is small compared to the available channel capacity.
  • Figure 4: Energy consumption breakdown across different split layers. Blue bars represent cumulative computation energy on the edge device,and red error bars indicate the mean and range of transmission energy measured over multiple frames. Early splits incur higher transmission energy due to larger activation sizes, while deeper splits increase computation energy as more layers are processed locally.
  • Figure 5: Raspberry Pi 4 experimental setup demonstrating real-world edge constraints. The limited computational resources (4GB RAM, ARM Cortex-A72) and thermal constraints (visible heat sinks) directly motivate our constraint-aware optimization approach. Camera module and wireless connectivity represent typical split-inference deployment scenarios where energy and latency budgets are critical.
  • ...and 5 more figures

Theorems & Definitions (1)

  • theorem 1: Cumulative Regret