Table of Contents
Fetching ...

Cooperative Inference for Real-Time 3D Human Pose Estimation in Multi-Device Edge Networks

Hyun-Ho Choi, Kangsoo Kim, Ki-Ho Lee, Kisong Lee

TL;DR

The paper tackles real-time 3D human pose estimation in multi-device MEC by proposing a cooperative inference framework where edge devices with lightweight models filter images using dual thresholds and offload only ambiguous cases to a powerful edge server for re-inference. It shows that minimizing MPJPE is equivalent to maximizing the sum of per-device accuracies, enabling a two-stage, low-complexity optimization that jointly tunes device thresholds and transmission times to meet end-to-end delay requirements. Results in a virtual MEC setup demonstrate substantial MPJPE reductions and latency adherence compared with device- and server-centric baselines, with performance approaching server-centric accuracy at lower delay. The approach offers practical MEC applicability for real-time 3D pose tasks and provides a foundation for extension to real-world, multi-person deployments.

Abstract

Accurate and real-time three-dimensional (3D) pose estimation is challenging in resource-constrained and dynamic environments owing to its high computational complexity. To address this issue, this study proposes a novel cooperative inference method for real-time 3D human pose estimation in mobile edge computing (MEC) networks. In the proposed method, multiple end devices equipped with lightweight inference models employ dual confidence thresholds to filter ambiguous images. Only the filtered images are offloaded to an edge server with a more powerful inference model for re-evaluation, thereby improving the estimation accuracy under computational and communication constraints. We numerically analyze the performance of the proposed inference method in terms of the inference accuracy and end-to-end delay and formulate a joint optimization problem to derive the optimal confidence thresholds and transmission time for each device, with the objective of minimizing the mean per-joint position error (MPJPE) while satisfying the required end-to-end delay constraint. To solve this problem, we demonstrate that minimizing the MPJPE is equivalent to maximizing the sum of the inference accuracies for all devices, decompose the problem into manageable subproblems, and present a low-complexity optimization algorithm to obtain a near-optimal solution. The experimental results show that a trade-off exists between the MPJPE and end-to-end delay depending on the confidence thresholds. Furthermore, the results confirm that the proposed cooperative inference method achieves a significant reduction in the MPJPE through the optimal selection of confidence thresholds and transmission times, while consistently satisfying the end-to-end delay requirement in various MEC environments.

Cooperative Inference for Real-Time 3D Human Pose Estimation in Multi-Device Edge Networks

TL;DR

The paper tackles real-time 3D human pose estimation in multi-device MEC by proposing a cooperative inference framework where edge devices with lightweight models filter images using dual thresholds and offload only ambiguous cases to a powerful edge server for re-inference. It shows that minimizing MPJPE is equivalent to maximizing the sum of per-device accuracies, enabling a two-stage, low-complexity optimization that jointly tunes device thresholds and transmission times to meet end-to-end delay requirements. Results in a virtual MEC setup demonstrate substantial MPJPE reductions and latency adherence compared with device- and server-centric baselines, with performance approaching server-centric accuracy at lower delay. The approach offers practical MEC applicability for real-time 3D pose tasks and provides a foundation for extension to real-world, multi-person deployments.

Abstract

Accurate and real-time three-dimensional (3D) pose estimation is challenging in resource-constrained and dynamic environments owing to its high computational complexity. To address this issue, this study proposes a novel cooperative inference method for real-time 3D human pose estimation in mobile edge computing (MEC) networks. In the proposed method, multiple end devices equipped with lightweight inference models employ dual confidence thresholds to filter ambiguous images. Only the filtered images are offloaded to an edge server with a more powerful inference model for re-evaluation, thereby improving the estimation accuracy under computational and communication constraints. We numerically analyze the performance of the proposed inference method in terms of the inference accuracy and end-to-end delay and formulate a joint optimization problem to derive the optimal confidence thresholds and transmission time for each device, with the objective of minimizing the mean per-joint position error (MPJPE) while satisfying the required end-to-end delay constraint. To solve this problem, we demonstrate that minimizing the MPJPE is equivalent to maximizing the sum of the inference accuracies for all devices, decompose the problem into manageable subproblems, and present a low-complexity optimization algorithm to obtain a near-optimal solution. The experimental results show that a trade-off exists between the MPJPE and end-to-end delay depending on the confidence thresholds. Furthermore, the results confirm that the proposed cooperative inference method achieves a significant reduction in the MPJPE through the optimal selection of confidence thresholds and transmission times, while consistently satisfying the end-to-end delay requirement in various MEC environments.

Paper Structure

This paper contains 11 sections, 1 theorem, 40 equations, 12 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Minimizing the MPJPE is equivalent to maximizing the sum of the inference accuracies for all devices. That is, where $\pmb{\theta}\triangleq\{\pmb{\theta}_{l}, \pmb{\theta}_{h}, \pmb{\theta}_{s}\}$ is defined.

Figures (12)

  • Figure 1: System model for proposed cooperative inference method.
  • Figure 2: Operational flow of proposed cooperative inference method.
  • Figure 3: Confusion matrix analysis for proposed cooperative inference method at (a) end device $i$ and (b) edge server.
  • Figure 4: Virtual experimental environment implemented in Unity.
  • Figure 5: PDFs of average confidence scores on the device and server.
  • ...and 7 more figures

Theorems & Definitions (3)

  • Lemma 1
  • Remark 1: Proof of Convergence
  • Remark 2: Analysis of Complexity