Table of Contents
Fetching ...

More is Different: Prototyping and Analyzing a New Form of Edge Server with Massive Mobile SoCs

Li Zhang, Zhe Fu, Boqing Shi, Xiang Li, Rujin Lai, Chenyang Yang, Ao Zhou, Xiao Ma, Shangguang Wang, Mengwei Xu

TL;DR

The paper investigates a novel edge-server architecture, SoC Cluster, which aggregates tens of mobile SoCs in a compact 2U rack to improve energy efficiency for edge workloads. It presents concrete hardware prototyping (60 Snapdragon 865s), a benchmark suite for video transcoding and DL serving, and a comprehensive measurement study comparing against traditional Intel+NVIDIA servers. Key findings show substantial energy efficiency gains for inference workloads (up to ~6.5× throughput per energy) and clear advantages of hardware codecs for transcoding, while highlighting latency and cross-SoC collaboration challenges that require software and networking improvements. The longitudinal study reveals rapid performance improvements in mobile co-processors over time, suggesting growing viability of SoC-based edge clusters, with practical guidance on cost, workload fit, and architectural enhancements for wider deployment.

Abstract

Huge energy consumption poses a significant challenge for edge clouds. In response to this, we introduce a new type of edge server, namely SoC Cluster, that orchestrates multiple low-power mobile system-on-chips (SoCs) through an on-chip network. For the first time, we have developed a concrete SoC Cluster consisting of 60 Qualcomm Snapdragon 865 SoCs housed in a 2U rack, which has been successfully commercialized and extensively deployed in edge clouds. Cloud gaming emerges as the principal workload on these deployed SoC Clusters, owing to the compatibility between mobile SoCs and native mobile games. In this study, we aim to demystify whether the SoC Cluster can efficiently serve more generalized, typical edge workloads. Therefore, we developed a benchmark suite that employs state-of-the-art libraries for two critical edge workloads, i.e., video transcoding and deep learning inference. This suite evaluates throughput, latency, power consumption, and other application-specific metrics like video quality. Following this, we conducted a thorough measurement study and directly compared the SoC Cluster with traditional edge servers, with regards to electricity usage and monetary cost. Our results quantitatively reveal when and for which applications mobile SoCs exhibit higher energy efficiency than traditional servers, as well as their ability to proportionally scale power consumption with fluctuating incoming loads. These outcomes provide insightful implications and offer valuable direction for further refinement of the SoC Cluster to facilitate its deployment across wider edge scenarios.

More is Different: Prototyping and Analyzing a New Form of Edge Server with Massive Mobile SoCs

TL;DR

The paper investigates a novel edge-server architecture, SoC Cluster, which aggregates tens of mobile SoCs in a compact 2U rack to improve energy efficiency for edge workloads. It presents concrete hardware prototyping (60 Snapdragon 865s), a benchmark suite for video transcoding and DL serving, and a comprehensive measurement study comparing against traditional Intel+NVIDIA servers. Key findings show substantial energy efficiency gains for inference workloads (up to ~6.5× throughput per energy) and clear advantages of hardware codecs for transcoding, while highlighting latency and cross-SoC collaboration challenges that require software and networking improvements. The longitudinal study reveals rapid performance improvements in mobile co-processors over time, suggesting growing viability of SoC-based edge clusters, with practical guidance on cost, workload fit, and architectural enhancements for wider deployment.

Abstract

Huge energy consumption poses a significant challenge for edge clouds. In response to this, we introduce a new type of edge server, namely SoC Cluster, that orchestrates multiple low-power mobile system-on-chips (SoCs) through an on-chip network. For the first time, we have developed a concrete SoC Cluster consisting of 60 Qualcomm Snapdragon 865 SoCs housed in a 2U rack, which has been successfully commercialized and extensively deployed in edge clouds. Cloud gaming emerges as the principal workload on these deployed SoC Clusters, owing to the compatibility between mobile SoCs and native mobile games. In this study, we aim to demystify whether the SoC Cluster can efficiently serve more generalized, typical edge workloads. Therefore, we developed a benchmark suite that employs state-of-the-art libraries for two critical edge workloads, i.e., video transcoding and deep learning inference. This suite evaluates throughput, latency, power consumption, and other application-specific metrics like video quality. Following this, we conducted a thorough measurement study and directly compared the SoC Cluster with traditional edge servers, with regards to electricity usage and monetary cost. Our results quantitatively reveal when and for which applications mobile SoCs exhibit higher energy efficiency than traditional servers, as well as their ability to proportionally scale power consumption with fluctuating incoming loads. These outcomes provide insightful implications and offer valuable direction for further refinement of the SoC Cluster to facilitate its deployment across wider edge scenarios.
Paper Structure (20 sections, 17 figures, 7 tables)

This paper contains 20 sections, 17 figures, 7 tables.

Figures (17)

  • Figure 1: CDF of resource subscription of VMs in Microsoft Azure cortez2017resource and Alibaba ENS ens. Approximately 66% of Azure VMs and 36% of Alibaba ENS VMs can be accommodated within a mobile SoC evaluated in this study (i.e., a Qualcomm Snapdragon 865 chip with 8 CPU cores, 12 GB memory and 256 GB storage).
  • Figure 2: The architecture of SoC Cluster.
  • Figure 3: Hardware management in SoC Cluster.
  • Figure 4: A manufactured SoC Cluster and a PCB.
  • Figure 5: The network throughput of an in-the-wild SoC Cluster that serves cloud gaming workloads over 38 hours. The server is randomly picked from an edge site of one edge service provider. Full network capacity: 20 Gbps.
  • ...and 12 more figures