Table of Contents
Fetching ...

EdgeSight: Enabling Modeless and Cost-Efficient Inference at the Edge

ChonLam Lao, Jiaqi Gao, Ganesh Ananthanarayanan, Aditya Akella, Minlan Yu

TL;DR

EdgeSight tackles the challenge of cost-efficient modeless inference at the edge by combining confidence-based hierarchical inference, an unreliable frontend transport with lossy JPEG recovery, and hardware-aware prototypes. The system uses a lightweight frontend with a confidence threshold to decide when to fall back to a more accurate backend model, reducing unnecessary model loading and latency, while its lossy inference design tolerates packet loss without catastrophic accuracy loss. A streaming FPGA frontend prototype demonstrates substantial power savings and latency improvements, alongside an end-to-end GPU-based implementation for practicality. The evaluation shows up to 1.6x improvement in P99 latency and up to 3.34x power reduction on FPGA, along with significant reductions in model-swapping costs, indicating strong practical impact for edge deployments with diverse accuracy requirements and volatile networks.

Abstract

Traditional ML inference is evolving toward modeless inference, which abstracts the complexity of model selection from users, allowing the system to automatically choose the most appropriate model for each request based on accuracy and resource requirements. While prior studies have focused on modeless inference within data centers, this paper tackles the pressing need for cost-efficient modeless inference at the edge -- particularly within its unique constraints of limited device memory, volatile network conditions, and restricted power consumption. To overcome these challenges, we propose EdgeSight, a system that provides cost-efficient EdgeSight serving for diverse DNNs at the edge. EdgeSight employs an edge-data center (edge-DC) architecture, utilizing confidence scaling to reduce the number of model options while meeting diverse accuracy requirements. Additionally, it supports lossy inference in volatile network environments. Our experimental results show that EdgeSight outperforms existing systems by up to 1.6x in P99 latency for modeless services. Furthermore, our FPGA prototype demonstrates similar performance at certain accuracy levels, with a power consumption reduction of up to 3.34x.

EdgeSight: Enabling Modeless and Cost-Efficient Inference at the Edge

TL;DR

EdgeSight tackles the challenge of cost-efficient modeless inference at the edge by combining confidence-based hierarchical inference, an unreliable frontend transport with lossy JPEG recovery, and hardware-aware prototypes. The system uses a lightweight frontend with a confidence threshold to decide when to fall back to a more accurate backend model, reducing unnecessary model loading and latency, while its lossy inference design tolerates packet loss without catastrophic accuracy loss. A streaming FPGA frontend prototype demonstrates substantial power savings and latency improvements, alongside an end-to-end GPU-based implementation for practicality. The evaluation shows up to 1.6x improvement in P99 latency and up to 3.34x power reduction on FPGA, along with significant reductions in model-swapping costs, indicating strong practical impact for edge deployments with diverse accuracy requirements and volatile networks.

Abstract

Traditional ML inference is evolving toward modeless inference, which abstracts the complexity of model selection from users, allowing the system to automatically choose the most appropriate model for each request based on accuracy and resource requirements. While prior studies have focused on modeless inference within data centers, this paper tackles the pressing need for cost-efficient modeless inference at the edge -- particularly within its unique constraints of limited device memory, volatile network conditions, and restricted power consumption. To overcome these challenges, we propose EdgeSight, a system that provides cost-efficient EdgeSight serving for diverse DNNs at the edge. EdgeSight employs an edge-data center (edge-DC) architecture, utilizing confidence scaling to reduce the number of model options while meeting diverse accuracy requirements. Additionally, it supports lossy inference in volatile network environments. Our experimental results show that EdgeSight outperforms existing systems by up to 1.6x in P99 latency for modeless services. Furthermore, our FPGA prototype demonstrates similar performance at certain accuracy levels, with a power consumption reduction of up to 3.34x.
Paper Structure (30 sections, 16 figures, 1 table)

This paper contains 30 sections, 16 figures, 1 table.

Figures (16)

  • Figure 1: Edge-DC Deployment
  • Figure 2: Serving different requirements with two models
  • Figure 3: Impact of Packet loss
  • Figure 4: Breakdown of inference latency with 0ms network delay
  • Figure 5: EdgeSight Overview
  • ...and 11 more figures